THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


' The Distribution Theory of Runs. A. M. Moop 
BA Generalization of the Law of Large Numbers. Hitpa GEIRINGER. 393 


» Conditions for Uniqueness in the Problem of Moments. M. G. 
3 KENDALL 


PAGE 


‘ On Samples from a Normal Bivariate Population. C. T. Hsu... 410 


‘when the Expected Marginal Totals are Known. W.Epwarps 
me. DEMING AND FREDERICK F. STEPHAN 
a Notes: 
aes The Standard Errors of the Geometric Means and their Application to 
Index Numbers. NiLan Norris 
A a on the Use of a Pearson Type III Function in Renewal Theory. 


Estimates of Parameters by Means of Least Squares. Evan 
JOHNSON, JR F 


The Teaching of Statistics 
Address. HAROLD HOTBLLING. ¢...... ccc cece cc cecc sow esesteccecees 457 
Discussion. W. Epwarps DemInG 
Resolutions on the Teaching of Statistics 


~ Report of the Hanover Meeting of the Institute 





THE ANNALS 
OF MATHEMATICAL STATISTICS 


EDITED BY 
8. S. WILKS, Editor 


A. T. CRAIG J. NEYMAN 


WITH THE COOPERATION OF 


H. C. Carver R. A. FisHer R. pE MIssEs 

H. Cramtr T. C. Fry E. 8S. Pearson 
W. E. Demine H. Horse..ine H. L. Rierz 

G. Darmois W. A. SHEWHART 


The ANNALS OF MaTHeEmaTicaL Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 
Md. Subscriptions, renewals, orders for back numbers and other business come 
munications should be sent to the ANNALS oF MarHemarticat Statistics, M 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Instix 
tute of Mathematical Statistics, P. R. Rider, Washington University, S& 
Louis, Mo. 


Manuscripts for publication in the ANNALS oF MATHEMATICAL STATISTION 5 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jersey. Manuscript $ 


should be typewritten double-spaced with wide margins, and the original e¢ py 
should be submitted. Footnotes should be reduced to a minimum and whenever 
possible replaced by a bibliography at the énd of the paper; formulae in foot 
notes should be avoided. Figures, charts, and diagrams should be drawn Cm 
plain white paper or tracing cloth in black India ink twice the size they are’ 
be printed. Authors are requested to keep in mind typographical difficultie 

of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints wi * 
covers will be furnished free. Additional reprints and covers furnished at " i 


The subscription price for the ANNALS is $4.00 per year. Single copies $1.2 
Back numbers are available at the following rates: 


Vols. I-IV $5.00 each. Single numbers $1.50. 
Vols. V to date $4.00 each. Single numbers $1.25. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the Act of March 3, Ii 4 











THE DISTRIBUTION THEORY OF RUNS 
By A. M. Moop 


1. Introduction. In studying a particular sample, the order in which the 
elements of the sample were drawn is frequently available to the statistician. 
This important information is usually entirely neglected by him. Such dis- 
regard must be attributed, to a considerable extent, to the unsatisfactory state 
of mathematical devices for using the knowledge in question. One reasonable 
mathematical method for handling this information, the one to be used in this 
paper, is to make use of the distribution of runs. A run is defined as a succession 
of similar events preceeded and succeeded by different events; the number of 
elements in a run will be referred to as its length. 

The distribution theory of runs has had a stormy career. The theory seems 
to have been started toward the end of the nineteenth century rather than in the 
days of Laplace when there was so much interest in games of chance. In 1897 
Karl Pearson [1], in a discussion of data taken from the roulette tables at Monte 
Carlo, wrote ‘‘...the theory of runs is a very simple one.” In this book he 
developed no theory but it is evident from his computations that he regarded the 
distribution of runs as a special case of the multinomial distribution. The 
multinomial method, besides evading the issue somewhat and raising questions 
of random sampling, also gives incorrect results when one is interested in runs 
of more than one kind of element. In 1899 Karl Marbe [2] derived an expression 
for the mean of the number of iterations of a given length from a binomial 
population. ‘This result was incorrect because he neglected dependence between 
overlapping iterations. An iteration is defined as a sequence of similar events; a 
run of length ¢ is counted as ¢ — s + 1 iterations of length sfors < t. Marbe 
has assembled a great mass of data with the object of proving the popular 
hypothesis that a “head’’ becomes highly probable after a long succession of 
“tails” has appeared. Ordinary significance tests applied to his data do not 
support this contention, but Marbe continues to advocate it [3] and [5]. Of 
course, he has been severely criticised by many mathematical statisticians. 

In 1904 Griinbaum [6] derived the mean of the number of runs of given length 
from a binomial population by the multinomial method. The first correct 
formulae were derived in 1906 by Bruns [7] who found the mean and variance of 
the number of iterations of given length in samples from a binomial population. 
In a book published in 1917 von Bortkiewicz correctly derived for the first time 
the mean and variance of runs from a binomial population using a method similar 
to that of Bruns. This book [8] contains a great many formulae for means and 
variances of runs and iterations under various special circumstances; a large 
portion of it is devoted to an exhaustive criticism of Marbe’s work. In 1921 von 

367 





368 A. M. MOOD 


Mises [9] showed that the number of long runs of given length was approximately 
distributed according to the Poisson law for large samples. 

It was not until 1925 (so far as the author has been able to ascertain) that an 
actual distribution function appeared when Ising [10] gave the number of ways of 
obtaining a given total number of runs (without regard to length) from arrange- 
ments of two kinds of elements. Stevens [12] in 1939 published the same dis- 
tribution and described a x’ criterion for significance. Wald and Wolfowitz [13] 
in 1940 published the same distribution and showed that it was asymptotically 
normal. These papers are all concerned with random arrangements of a fixed 
number of elements of each of two kinds; the last mentioned paper describes a 
very interesting application of the distribution to the problem of testing the 
hypothesis that two samples have come from the same continuous distribution. 
Wishart and Hirshfeld [11] in 1936 derived the distribution of the total number of 
runs (again without regard to length) in samples from a binomial population and 
showed it was asymptotically normal. 

In this paper we shall derive distributions of runs of given length both from 
random arrangements of fixed numbers of elements of two or more kinds, and 
from binomial and multinomial populations. Also we shall give the limiting 
form of these distributions as the sample size increases. These limiting dis- 
tributions are all normal. The distribution problem is, of course, a combina- 
torial one, and the whole development depends on some identities in combinatory 
analysis,—some new and some well known to students of partition theory. 

The paper will be divided into two parts. The first will deal with distribu- 


tions obtained from random arrangements of a fixed number of each kind of 
element. The second will deal with distributions of elements from a binomial 
or multinomial population. 


Part I 


2. Distribution of runs of two kinds of elements. Consider random arrange- 
ments of n elements of two kinds, for example 7; a’s and nz b’s with m + ™ = n. 
Let r,; denote the number of runs of a’s of length 7, and let 72; denote the number 
of runs of b’s of length 7. For example the arrangement 


abbabaaabbaaa 


will be characterized by the numbers ry, = 2, 713 = 2, 721 = 1, 722 = 2, and all 
other 7;; = 0. Also we let 7; = d ri; and 72 = dX To; denote the total number of 


runs of a’s and b’s respectively. Throughout the paper a binomial coefficient 
will be denoted by 


~ (t) = eee 


and this is defined to be zero when m < k. A multinomial coefficient will often 
be denoted by 





DISTRIBUTION THEORY OF RUNS 


m m! 
(2.2) | ~ amlmel--+ m,! 


(2.3) zm; = m, m= 0 


and when such a coefficient is to be summed over the indices m; the two condi- 
tions (2.3) are always understood and will not be repeated; other conditions on 
the indices will be placed below the summation sign. 
Given a set of numbers 7;; (¢ = 1, 2;7 = 1, 2, ---, n;) such that DL ats =n, 
2 


there are I and [| different arrangements of the runs of a’s and b’s respec- 


1j 2j 
tively. Hence the total umber of ways of obtaining the set 7;; is 


(2.4) N(r,;) = [| | | F(r,, 72) 


where F(r; , 72) is the number of ways of arranging 7; objects of one kind and rz 
objects of another so that no two adjacent objects are of the same kind. Thus 


F(r; , 12) =0 if | ry = Te | > 1, 
(2.5) =1 if |n—mn|=1, 
=2 if 1 = 1% 
Since there are C possible arrangements of the a’s and b’s, we have at once the 
1 
distribution of the r;; 


i rtrg = Lull 
(*) 


Certain marginal distributions will also be of interest. To obtain, for example, 


the distribution of the 71; , it is first necessary to sum [7 | over all partitions of 
25 


m. This is easily accomplished by finding the coefficient of x” in 


r 


2 3 "3 r 2 to 
@Q+ot+at+.-.)2=2*1+r+x +---) ~ (l—2)" 


_ > se 
4 > ( ee 


The term corresponding to t = m: — 12 gives the desired result: 


(2.7) 2 joe bed 5 (" _ i) 





We have then 


P(r, 72) = ( 





thus we have 





T1 


P(ry, 72) = | 


P(r) = | 


m—1 


N+ 1 
inl ri 


P(r) = ® 


If we let 


sy = Ty, 


Ti 


) 


and summing this over 72 , a slight simplification gives 


_— 


m—1 
m1 —1 


MN —1 
To — 1 


) 


which is essentially the distribution derived by Wald and Wolfowitz [13], and 
summing this over rz we get the distribution discussed by Stevens [12] 






) F(r;, 2) 


The distribution (2.6) summed over 7; and 72; gives by means of (2.7) 


) F(r, T2) 


Another marginal distribution which will be useful is obtained by summing 
(2.9) over ri; for 7 > k. 


j<k, 


ni k—1 
su = Dry, A= Dijin, 


we must then sum the multinomial coefficient 


Ty! «+. 


over all partitions of m, — A such that every part is greater thank — 1. This 
is given by the coefficient of z™*“ in 


Su! _ e 
Tix! eee Tin, 


Sx — 1 


—-A—(k-—1)sy,-—1 


&, — 1 





(x* + g +...) = aktik = ¥ —-l1i+t 
t=0 





DISTRIBUTION THEORY OF RUNS 371 


where }/ denotes summation over all positive integers ru, 71141, +++ 5 Tim, 
ni 


such that Loins =n, — A. This identity with (2.9) gives 


ky + ‘)(™ —-A-—(k— 1) — 1 
(2.13) Plo) =I & JN eal 


(:) 


Another useful distribution analogous to (2.13) is derived by considering runs 
of both kinds of elements. . If we define ; (7 = 1, 2, --- , h) and B in terms of 
1; just as s;; and A were defined above, it follows at once from (2.6) and (2.12) 
that 


(2.14) P(si, cm) = 
$1 S82 m—A— (k— 1)sy —1 eens 
Pele A ee Ere 
n ? 
(",) 
‘= 1,2,.-- , kj = 1, 2, ooh 


These last two distributions should be the most useful for applications. The 
long runs have been added together to form the new variables s, and sa, thus 
decreasing materially the number of variables as compared with (2.6) and (2.9) 
while at the same time little information is lost. One is free to choose k and h 
so that the number of variables is appropriate for the data at hand. Moreover, 
it is shown in Section 5 that these variables are asymptotically normally distrib- 
uted so that one may apply a simple x’ test of significance for “randomness of 
elements with respect to order’? when dealing with large samples. We shall 
then be able to test whether a sample has been “randomly” drawn in a certain 
sense. 


3. Moments for runs of two kinds of elements. Instead of dealing with the 
ordinary moments we shall obtain formulae for the factorial moments because 


the expressions are much more compact. As is customary, a factorial will be 
denoted by 


(3.1) a” = a(x — 1)(x — 2)--- («@ —a +1), 
and z” is defined to be 1. Of course the ordinary moments are determined by 
the factorial moments by means of relations of the type 
a = >> Cir. 
t=0 
A recent discussion of the coefficients C{ has been given by Joseph [14]. The 
mathematical expectation of a function f(r) will be denoted by 











372 A. M. MOOD 


(3.2) | E(f()) = LIMPO). 





Of course E is a linear operator. We shall require the following identity 


(as)} | _ ¢ze,) (™ — 2 ta; — 1 
(3.3) Law I] "i i =" (" i 


where 2) denotes summation over all positive integers rn, M12, +++ , Tin, Such 


that - iri; = %. (3.3) may be verified by differentiating 
1 


g(t) = (ha + a y 


a; times with respect to ¢; (¢ = 1, 2,---, m), then finding the coefficient of 
x” after putting t; = 1. The identity (3.3) enables us to find the factorial 
moments of the variables in the distribution (2.9) for we have 


rage) Emer )/) 
“Ee (Fen MO) 
(3.4) =D (m+ 1% (" ; ——-s* (1) 
nooo (EEC) 


The sum on 7; involved in the last step is given by the identity 


. A B\ (A+B 
oe 9 ed Ee eat 
which is readily obtained by equating coefficients of x° in 


a+at(1+1)"= Q+ 2). 


2B 
We shall give here the means, variances and covariances obtained from (3.4) 


(3.6) E(ru) = (m2 + 1)? nf? /n“, 


ns (me + 1)? nft? — nine + 1)? nf? rn? 
(3.7) 6 = ———_ - 





nttit2) Net) yGt+D . 


(3.8) = (ma + DO mY (a + YM ni? (1 _ (ut raf) 


Cui 


n(24+2) ner) 


nit) 











DISTRIBUTION THEORY OF RUNS 373 


These will be needed in the section dealing with asymptotic distributions. The 
moments for the distribution (2.6) follow at once from (3.3) as 
E (TI rig? rij?) = Qo riP0 ry 
ij 


Tye 


(3.9) C=. m2 — >, jb; — 1 n 
’ )( Jon /( ) 

m—->a-1 t2— >b;-1 ny 
The summation on 12 is accomplished by putting re = 7, — 1, 1, and 7 + 1, 
but after that has been done it is necessary to expand the product of the two 
factorial factors in factorial powers of the lower index of one of the binomial 
coefficients. This is easily done for the first few moments, but there appears 
to be no simple expression for the general case. The means, variances and 
covariances of 7; are given by (3.6), (3.7) and (3.8) and those of r2; are obtained 
from these equations by interchanging m, and m,. The other covariances are 

af nii*®) a nit) 


Trisrag G42) nett) 


(3.10) a , 
+ nn? _ (ny + 1)? (ne + 1)? nn? 
ni) nt) nGt+) ‘ 


A slight variation of the method above will give the moments of the 8; in 
the distribution (2.13). An accent on a summation sign will indicate that the 
term corresponding to 1 = k is to be omitted. Differentiating 


(ts) = [tix + tea? + --- + tea + (e* +o? 4+... DY" 


a; times with respect to ¢; and finding the coefficient of z™* after putting ¢; = 1, 
we obtain 

81 m—A— (k—1)sun—1 

81 _S 1 


= g(200) (" - 2 = a — } 
_ ZL _—— 1 
This with (2.13) gives by the same steps as used in obtaining (3.4) 


(3.12) B(II oi) = (m + 1% (" a S | \/ (: ) 


1 


The first two moments are 


(3.13) E(o,) = + Uni? 


n® ’ 


@.14) 1 = Mara + Dri _ malta + D* ni? nf? 
= nite) es 











374 A. M. MOOD 








(me + 1)%n™ | (m+ 1)ni” (nz + 1)n{ 
(3.15) Ok = er ~ ~~ U 1 —_ n® 


The others are, of course, given by (3.6), (3.7) and (3.8). 
The joint moments of the variables in (2.14) as obtained from (3.11) are 


nm — z. ta; +a, — 1 
E (ag) (05), (Zag) .(2Zb;) 
(II $15 82; ) i $1 82 ei ~ ean’ 


m — >) jb; + by — 1 n 
J F " 
( ~~“ y~i )rtn, » /(*) 


In addition to the covariances (3.10) we shall need 


(k+2) — (j+1) (k+1) _ (j+1) (k+1) _ (9) (k) . (3) 
nm ne +2ni Nn’ 49% ny +n; Nn,’ 


(3.16) 


— net) 
(3.17) , 
— (a + Oe + 1)? nf nf 
n® nG+) ’ 
" ara ni” an” (ni + 1) (ne + 1)n” ni” 
(8.18) oo = -_aee eee 





ner) neth—) n® n™ 


The moments of r in the distribution (2.11) may be derived easily by means 
of (3.5) as 


(3.19) E(r{?) = (nm + 1)° (4) / ) 






From which 
(3.20) E(r,) = (m2 + Im 
n ? 
(3.21) gt = MADEN? 
. eee 





nn® 









4. Distribution and moments of runs of k kinds of elements. This section 
is a generalization of the preceeding two sections to several kinds of elements. 
The case k = 2 was treated separately because the special character of the 
function F(r; , 72) in this instance made the distribution comparatively simple. 
Now we shall be interested in k kinds of elements denoted by a;, --- , a, and 
we shall suppose there are n; elements of the ith kind. We let 7;; denote the 
number of runs of elements of the 7th kind of length 7, and put 









ns 


k 
n= 2%, n= Do Ty. 
1 7=1 





The same argument as was used in deriving (2.6) gives 





n 
ny 


ng) 


cans 


tion 
ents. 
the 
aple. 


» the 





DISTRIBUTION THEORY OF RUNS 


k 
II VPs rs) 


(4.1) P(r,;) = = i 
n 
I 


where the function F(r; , 72, --- , 7x), which will be referred to hereafter simply 
as F(r;), represents the number of different arrangements of 7; objects of one 
kind, 72 objects of a second kind, and so forth, such that no two adjacent objects 
are of the same kind. We shall be able to give the explicit expression for F(r;) 
after examining the marginal distribution P(r;)._ This is obtained by summing 
(4.1) over r; with r;; fixed by means of the identity (2.7) giving 


(4.2) P(r) = U ("= 1) Feo 


[Ln 


Despite our present meager knowledge of F(r;) it is possible to find the 
moments of the r; as distributed by (4.2). Since >> P(r;) = 1, we have the 


identity 


(4.3) 0 . 74 ) nai Ln.) 


From this the moments are easily derived. If we put 


(4.4) ts = 5 — 8s 
we have 
STM w ("=f ) Foo = LT wr (71) Fe 


= x II @ - 1° I] Y ame . ') F(r,) 
= Tow - 9 DO(* 2 7) Fe 
a Il —_— 1)? k = } ~ 


ae nu — a 
The summation involved in the last step is given by (4.3). On dividing the 


last equation by "| we get the factorial moments of the u; 


(4.5) E (11 us?) * II (n; — 1) mes / it 


From these equations the moments of the r; may be found; the means, variances 
and covariances are 











376 A. M. MOOD 


(4.6) By) o MED, 
nn? 
(4.7) oi; = ne 
(2) (2) 
_ ni (n—n + 1) 
(4.8) eee 


It is clear that 


k 
y(t) = Coefficient of Il af‘ in 
(4.9) , 
(a1 + coef z)* TT (ry + 0. + tar + bei + Dig tee e+ ay [| 


is a generating function for the moments of the variables u;. This generating 
function will enable us to find the exact expression for F(r;) for we have 


k 
P(u; = ni) = Coefficient of [J t?** in g(t, 
1 


ee ee 


eee Ee ne 


P(ui) = II rir od / oo 


and equating the expressions on the right of the last two equations we have 
e igi") 
(4.10) P(r) = seeuberl nil ne 
a 
1 (% =) 
k)rtn- " 
ll - 
_ oui, [e | | Ni; 


iMG GRP j—2; 


Also 


in which the prime on the n;; indicates that the indices corresponding to j = i 
are to be omitted; hence 7 takes all the values 1, 2, --- , k andj takes all values 
1, 2,--- , & except 7 because the index n,; has been cancelled with n; — r; in 
the binomial coefficient in the denominator of (4.10). It is clear from (4.11) 
that F(r;) may be expressed as follows 


b 
| F(r) = CTT] az*(ai + e+ + + au)*(te + 23 + +++ +) 
(4.12) 1 


(titttese tm)" Gite +o 


in which “CT”’ is an abbreviation for “‘constant termi of.’ 





= Tw0UlWwWm6DEFlrti‘C'C’ 


DISTRIBUTION THEORY OF RUNS 377 


We are now in a position to obtain moments of the variables r;; in the distribu- 
tion (4.1) by means of identities similiar to (4.3). As an illustration we compute 


%-6@e-1 k ni—1 af ee | k — 
BCE DiC) BC-2- DAG) 
k k 

CT Ta" I (a, + cee + t2;+ bes +2] 


ty=n0 


k 
= CT IT ti" * (ty + ees + ry)” “(te + oes + 2)" 
_f[n (n _ mn) 
ia 2n@ 

or 


_— k as aoe — ».\() 
an E(SISTT) (a7) Fo = Se 
. n;! 


1 


The moments of r;; may be computed from identities of this type together with 
(3.3). The first two moments are 


(4.14) E(r) = (n — ng + 1)°n? /n*” 
(4.15) Er? = n(n — n(n — 0; + 1)? /n? 


(4.16) E(rsri) a nite (n = n;) (n — 7; + rae j ~t 
' avn . (2) (2) 
E(rijra) = (mi —j — (a — t— 1) a Ms -F + YO — t +1) 


+ 2n—n, —nJ)(n;s-j+)a—-t+)(n1—-t+n-j 
+ (n—n;—n,)°[(n, —t+ 1° + 2; —F + Di. —t + 1) 
+ (n; — 7 +1)°] + 2(n — 0; — 0.) (n; —j +, — t + 2) 


(J—1) ., (81) 


+ (n —n; —n,)®} + 2(n; —j — 1) (nm —F +) 


nit) 


-(n, = t+ 1)° + (n -— ae = n,)(2(n; —j + 1)(n, —ft + 1) 


(4.17) ; 7 
+ (n, —t+1)9) + mr —1, -— n,)°R@, —t+1) + -j+ 1) 
(3-1) , (t—1) 
+ (n = mj —n,)°} + 2(n, — t — 1) E(u - 8 +0) 


(nm —G+)%+ a —n —n) 2; -F+ (in -—t4+) 
+ (n-—G +) 4+ rn —n — 2)? 2, -—7 +1) + (m, —t4+)] 
(j-1) . (#1) 


tm —n)?} +42 (us -§ + DO -— t+) 


+(n-—n—n)(n,-—j+tn—t+2)+nm—n —n,)}. 





378 A. M. MOOD 


Such a lengthy expression as this last one can hardly be useful to the statistician, 
and for this reason we shall not define variables s;; analogous to the s;; and »; 
of Section 2 and take the time and space to find their moments. 


5. Asymptotic distributions. We shall show that some of the distributions 
obtained previously are asymptotically normal when the n; become large in 
such a way that the ratios n;/n remain fixed. The description “asymptotically 
normal” means that the distribution approaches the normal distribution uni- 
formly over any finite region as n; > ©. The ratios n;/n will be denoted by 
e;, hence Ze; = 1. The symbol O(1/n’) will represent any function such that 


Lim n “o(4 :) =[< o, 
We shall not, of course, be able to get any limit theorems for distributions 
like (2.6) or (2.9) because the number of independent variables increases with 
n. We shall consider first the distribution (2.13) whose asymptotic character 
is given in the following theorem. 
THEOREM 1. The variables 
Sig — neiez 
J/n 
a net 2 
J/n 
are asymptotically normally distributed with zero means and variances and co- 
variances 
ij = 1 ea[(t + 1G + loeree — tjee — 2a], 7,7 <k, i Xj 
ois = e7 €3[(¢ + 1)'e1¢2 — vee — 2e)] + eie3, 1 <k 
Ck = eon" es[(a + 1)kee2 = tkee — él, t~<k 
one = €1 eal’ (e; -- lez — ex] + eter. 
The limiting means, variances and covariances are obtained from the relations 


(3.6), (3.7), (8.8), (3.13), (3.14) and (3.15). 


To demonstrate this theorem we make the substitutions 


(5.1) 


te = 


(5.2) 


ny = ne; ‘= & 2 
. 

8s = nejes + +/nz; a 
. : 

Su = Neier, + /n x 


k 
$1 = nee, + /n x Xi 


k—1 
A = ne — e& — keke) + -/n a 12; 


in (2.13), and estimate the factorials by means of Stirling’s formula 





DISTRIBUTION THEORY OF RUNS 


0 - vanee=(1+0(2) 


The result is an unwieldy expression which we shall not present at the moment. 
First we note that the exponential factors cancel out because the sum of the 
lower indices of a binomial or multinomial coefficient is equal to the upper index: 
Also we simplify the expression by considering in detail only terms which involve 
the z; ; the normalizing constant can be determined from the final limit function. 
Any function of the parameters will be represented by the letter K. Thus in 
(5.4) we need consider only the factor m™*?. All factorials will be of the form 


(5.5) m=na+ VnL(z) +b 


where L(x) is a linear function of the z; , and a and b are independent of n and 
Ti. Now 


= (na + VnL(z) + p)etveLeH 


= (na 


rot vante) tet (1 a L(z) + 


nat+/nL(z)+b+4 
a a/n *) 


L(z) 
aVJn 


and log m™*? = K + +/nL(z) log na + (na + ~/nL(z) + b + 3) 


L(z) 
-log (1 + — aA + *) 


= K + VnL(z) log na + (na + VWnL(z) + b + 4) ' 
L(t) . bd L*(z) 1 
(22 Ta” oat 0 (5) 


= K + VaL(2\(1 + logna) +L 1%@) +0(4), 


sis K(n q)V¥*e@ (1 } son + 


-_— 


so terms arising from b (and b + 3 in the exponent) will be neglected as they 
give rise only to terms inienenient of the x; or of order 1/n*. Of course log 
(1+ O(1/m)) = O(1/m). Thus, keeping significant terms only, the result of the 
substitutions (5.3) and (5.4) in (2.13) after taking logarithms and using (5.6) is 


k—1 
—log P(r) = K+ Va din (log neiez + 1) + 2 


a 


well (= x) (log nes + 1) + re (2: x) 





380 A. M. MOOD 


(6.7) +Vn » nee - 1)t2) (log net + 1) — 3 (> n+e< Das) 
1 1 


2 k 
+ 2+/na, (log neve: + 1) + - —~ Jn (= iz.) (log nei** + 1) 
1 


1 /< ; 1 
+ aga (Zam) +0( 7). 

The coefficients of z;(i < k) and 2, are 

a/n(log neier + 1 — log ne} — 1 + ilog nei +i — i log neXt — i) =0, 
o/n( — log nes — 1 + klog net + k + 2 log neier + 2 — k log nei — k) = 0. 
Hence only the quadratic terms remain and we have 

” 1 

(5.8) —log P= K+} 2» 07 2;2; + o(Z) 


where 


1 1}e2 Ss ae 
=.¢ 7, i,j<k, ij, 
€2 €i 


1 1 Ve , 
oo => + — *< k, 
aa «4 
i+ i(k — le 
————— 
1 


“2 
€2 


1 
= 5+ 


It is merely a matter of straightforward multiplication of the two matrices to 
verify that || 0°’ || is the inverse of || 0; ||, hence is a positive definite matrix. 
The details of the verification will be omitted. We have then 


5.10 | — Ke orn O ( )). 
;' a/n 
k 


1 
In this equation K must necessarily contain the factor ( aa) because there 


are k + 5 factorials in the denominator and 5 in the numerator of (2.13). 
Since Ar; = 1, this factor, in view of (5.1), may be replaced by IIAz; , so 


(5.11) P = Ke }**""*#*i TAz, ¢ + o(+.)). 


If we restrict the x; to any finite region R in the z-space, the function O(1 /r) 
approaches zero uniformly as n — ©. Thus, if A; < B; are any positive 








| to 


ere 
13). 


/n*) 


tive 


. 
DISTRIBUTION THEORY OF RUNS 381 


numbers such that the corresponding values of z;, say a; and b; , obtained by 
substituting A; and B; for r; in (5.1), determine a rectangular region R’(a; < 2; < 
b;), which lies in R we have 


bs we 
Spey = 3 xenan(1 +.0(1.)) 


(5.12) 
“ae” [ Ke Atetteces Ty de, 


by the definition of a definite integral and Riemann’s fundamental theorem. 
We have given some details of this proof in order that it may serve as a model 
for other theorems of a similiar nature which will appear later, and for which 
a complete proof will not be given. Two immediate consequences of Theorem 1 
will now be stated as corollaries. 
Corotutary 1. The variable 


— 7 — ™M%1& 
a/ nere2 
where r is the total number of runs of one kind of element, is asymptotically normally 
distributed with zero mean and unit variance. The limiting mean and variance 
were computed from (3.20) and (3.21). 
Corotiary 2. The variable Q = Xo''x,x; is oumettiediy distributed accord- 
ing to the x’-law with k degrees of freedom. 
In exactly the same manner in which Theorem 1 was deduced from (2.13), 


we may prove the following theorem corresponding to the distribution (2.14). 
THEOREM 2. The variables 


‘2 
81g — N€1€3 : 
25 = -—_—_—— t<k 
Vn : 
k 
Siz —— N€1 C2 
(5.13) LZ. = ———_— 
V/n ’ 
24 
Seg — N€1€2 . 
“w= t<h, 
/n 


are asymptotically normally distributed with] zero means and variances and 
covariances 


Cz32; = eye es[(¢ + 1)(j + leree — tje2 — 2e;] “,§ < &, 
C232; = ete al(z > 1)’ €1e2 — te, — 2e;] + eles t<k, 
Ozer, = C1 e2[(t + 1)kexe, — tke, — ei] i<k, 


z,2, = €1 eal— kez — e1] + ees, 













382 A. M. MOOD. 


(5.14) oyy, = e2? ex + 1)(j + leer — tjer — 2e,] %4j<h, 
Oyey, = €2 1l(t + 1)°e1¢2 — ve, — eo] + exe; i<h, 

Orgy = C1 2 [i + 1)(G + levee — ies — Bjer + deree + 2] 
t<khj<h, 
Oxy, = 1 eslk(j + levee — 2k — 1)ex — (j — ler + ere] ZF <h. 


These limiting variances were computed from the variances and covariances 
given in Section 3. We have chosen the variable sa, of (2.14) as the dependent 
variable. The proof of this theorem is omitted. From it the following corol- 
laries are deduced immediately. 
Corotuary 3. If uy = 2; and usyi = yi of (5.18) and || o” || (i, 7 = 1, 2, 
, k + h — 1) denotes the inverse of (5.14), then the variable Q = 2o"usu; is 
asymptotically distributed according to the x’-law with k + h — 1 degrees of freedom. 
Corouuary 4. If 3; = 81; + 82; denotes the total number of runs of both kinds of 
elements of length 1, and s; the total number of runs of length greater than k — 1, 
then the variables 


s; — n(eies + €2€1) 
Vn 
_ Se — neler + e241) 
Vn 
are asymptotically normally distributed with zero means and variances and 
covariances 


(5.16) Oij = Cz;2; + Czy; + Cz;y; + Fusy; +, 


We have put h = k in Theorem 2 to obtain this result. The terms on the right 
of (5.16) are defined by (5.14); terms which do not appear there may be found 
by interchanging ¢; and eé: in one of the relations. For example o,,,, is given by 
interchanging e; and é in the fourth equation of the set (5.14). 

Corotiary 5. The variable Q = Lo'’xsx; where the x; are defined by (5.15) 
and || o°’ || is the inverse of (5.16), is asymptotically distributed according to the 

x’-law with k degrees of freedom. 

Corouuary 6. If s denotes the total number of runs of both kinds of elements, 
then the variable 


y= t<k 


(5.15) 


Le 


8 — 2neie2 
‘Q/ nerves 


is asymptotically normally distributed with zero mean and unit variance. This is 
the result derived by Wald and Wolfowitz [13]. 


z= 


6. Asymptotic distributions for k kinds of elements. We now investigate the 
asymptotic character of the distribution (4.2) 














DISTRIBUTION THEORY OF RUNS 


Il (* - 4 P(r) 


(6.1) P(r) = At 


[J 


where 7; is the total number of runs of the ith kind of element. 
THEOREM 1. If k > 2, the variables 


rn ne;(1 ag €:) 
(6.2) Xi ae 


are asymptotically normally distributed with zero means and variances and 
covariances 


(6.3) O35 = ee? ’ Ci = ex(1 = e:)*. 


The restriction k > 2 is made because in the case k = 2 the correlation between 
the two variables approaches one, and the numbers o;; are all equal. The result 
may be called a degenerate normal distribution and might be included in the 
theorem in this sense; we have chosen to omit it because this case is better taken 
care of by Corollary 1 of the previous section. 

The proof of this theorem will be simplified if in the moments (4.5) we replace 
the numbers n; — 1 by n;. This substitution will not, of course, affect the 
limiting moments. Hence we consider the variables v; with moments given by 


Ln] 


- = nei 
(6.5) Yi aa 


are asymptotically normally distributed with zero means and variances and co- 
variance (6.3). It is possible to prove this statement by showing that the 
characteristic function (Fourier transform) obtained by substituting 76; for t; 
in the moment generating function 


and shall show that 





k 
gn(t;) = Coef. of [] x?* in 
(6.6) . 


k 
II (ai + e+) faa + ha + tigi + --- +a)" /[™] 


1 nN; 


approaches 


(05) = e tarsi 


384 A. M. MOOD 


as n — ©. This method is not appropriate for proving a similiar theorem 
which appears in Part II, and we prefer to give here a demonstration that will 
suffice for both theorems. 


In order to prove our theorem we consider the general term in the coefficient 
of IIz;* in (6.6) 


ils}n/{ 
(6.7) C(mi) = IT — Te 7 
in which 
k 

(6.8) Dm = 1, 

must be required as well as the usual restriction on indices of a multinomial 
k 

coefficient, >, m;; = n;. Therefore only (k —- 1)’ of the indices are independent. 
j=1 


Clearly m;; = vs. Now without concerning ourselves about the statistical 
significance of the variables m;; , let us consider their distribution 


(6.9) D(my) = I nf LF. 


in which the variables corresponding to the values 7,7 = 1, 2,---,k — 1 will 
be chosen as the independent ones. We shall now prove a theorem from 
which Theorem 1 follows immediately. 

THEOREM 2. The variables 


_-— neé;e; p me 
(6.10) Sag 0 eine é fw i, & «-+,8=1 
2 a/n ’ ’ 

are asymptotically normally distributed with zero means and variances and co- 
variances given by 

Cijpa = Ci jpeg, 
(6.11) Gijip = — es(1 — exezey, 

Gij,53 = e€(1 — e)(1 — e;). 
First it is to be noted that the moments of the m;; are easily obtained from the 
identity 


(6.12) 7 iH [mJ ~ 7. 


as follows 


- I mi? II [* | => I] nfs? Ul " ™ x “| 


M3 — Aj 


-_ II n {ri 2s) ie i - 
t 
‘ n; "Dai 
é 










7m 


nt 


the 





DISTRIBUTION THEORY OF RUNS 





and on dividing this last relation by [| we obtain 
(6.18) E(II mi3*7) a II {Pi 29) II n{zs O64) /y (244 o47) 

i i i 
from which the moments (6.11) and the means in (6.10) were computed. 


The proof of the theorem is similar to that of Theorem 1 in Section 5. We 
make the substitutions 









k—1 


ns = née, Me; = Ny — Dy Mj, 
t=] 
k—1 k—1 
Miz = 1; — 2, My, Mee = Zn, + dy mij — 
i=l ‘j= 


mis = nee; + nz, 
in (6.9) and employ Stirling’s formula exactly as before. The details are too 
similiar to warrant repetition. The final result is 


) x Ket ott P82 izpq ” Ji ) 
(6.14) D(m,;) = Ke Il dz; ¢ + o( J) . 


Where || o*””* || is the inverse of (6.11) and is defined by 


oii? 1 ot = 1 + i + .3 i 
° ’ 
e ex C;e&x C;&x 50; 




















ot"? = i .. o'?4 = a _ 
€1&% ex Cex & 
Theorem 1 is a corollary of Theorem 2. Also we may state these additional 
results: 


Corotuary 1. If k (> 3) kinds of elements are arranged at random and r 
denotes the total number of runs of all kinds of elements, then the variable 


** n(1 — Ze?) 
Vn 
is asymptotically normally distributed with zero mean and variance 
o = Ze; — Wei + (Zei)” 
where e; is the proportion of elements of the 1-th kind. 

Corotiary 2. The variable Q = Zo'xx; where the x; are defined by (6.2) 
and ||" || is the inverse of (6.3), is asymptotically distributed according to the 
x-law with k degrees of freedom. 

As was mentioned in Section 4, we could define variables s;; (¢ = 1, 2, --- ,k 
andj = 1,2, .--, h;, the h; being a set of k arbitrary integers) with a distribu- 
tion similiar to (2.14). If one worked through the details he would find, no 





386 A. M. MOOD 


doubt, that these variables are asymptotically normal. The matrix of vari- 
ances and covariances is so complicated, however, that such a theorem would 
hardly be useful to the statistician, and the author does not feel that it would 
be worthwhile to go through the long and tedious details merely for the sake of 
completeness. 


Part II 


Instead of having the number of elements of each kind fixed, we now suppose 
that they are randomly drawn from a binomial or multinomial population. The 
numbers n; thus become random variables subject only to the restriction that 
2n; = n, the sample number. The development will be entirely analogous to 
that of Part I, and the same notation will be used. The probability associated 
with the 7th kind of element will be denoted by p; . 


7. Distributions and moments. The major part of the derivation of the 
various distribution functions has already been done in Sections 2 and 3. With 
the distributions of these sections we need only employ the fundamental 
relation 


(7.1) P(X, Y) = P(X | Y)P2(Y) 


in order to obtain the distributions required here. X will represent the set of 


variables r;; or 7;, and Y the variables n;. For the binomial population 
P.(Y) will be 


(7.2) P(n, ™) = * ) pte 


Therefore we may write down at once the distributions 


qa) Porm) =[" ||" ]Pe,mppt at, 


(7.4) P(r, ni) = 


Je 
(7.5) P(r, n p= ("2 a pr? 


1 


~@~tae~% ss 
(7.6) P(81;, y=[2](" “e a-0 im 4 ) pi p;’, 


rnvawed= [SISO 


‘io B- ~ = 1 - ') F(s;, 82)p"" ps", 


t=1,---,k,j=1,---,h, 





DISTRIBUTION THEORY OF RUNS 387 


corresponding to the distributions (2.6), (2.9), (2.11), (2.13) and (2.14) respec- 
tively. Of course there is some dependence among the arguments. In (7.4), 
for example, 7; is determined by 2771; = m, and nz by n — nm = m. In the 
last three distributions one of the n; is independent and one may sum these 
with respect to 7; from zero to n and obtain the distributions of the r’s alone. 
The results of such summations are quite cumbersome and in some cases can 
only be indicated, so we shall retain the n; as relevant variables. This remark 
applies also to the multinomial distribution. 

We shall obtain expressions for the joint moments of the variables in these 
distributions. It is clear that the moments in Section 3 will be of considerable 
aid; for, using the notation of (7.1), we have 


(7.8) E(f(X)g(¥)) = 2 f(X)g(V) PX, Y) = 2X oY) PAY) Ef) PuX/¥)] 


and the sum in the bracket on the right has been computed in Section 3. It re- 
mains only for us to multiply the previous moments by g(Y)P2(Y) and sum on 
Y. Corresponding to (3.4), (3.12), (3.9) and (3.19) we have 


m1 n — ——— - 
(7.9) B(n‘e II ri?) ae a n\ (ne }- 1) 7 (" 210; ”" pi p2?, 
1 ni=0 Nm, — 210; 


k n ° , 
a a; a a; o> ;— 24; m1," 
(7.10) B( ni ) I st?) —_ > ni (ne + 1)° Fe) (” 210; *) pi pe . 


n\>= 


mM — Dia; 


(7.11) E(n'r®) = DY nm + )(™ — ?) pm pr, 
n,=0 ma = b 


k h ° 
B(n II 3c? II of”) - 7 n® 96248) o(2b3) - — 2u;+ a, — ') 
1 1 18182 a = >'a; —1 
(7.12) by +b ‘ 
Ng — 20; aed nian 
( in 5'b, a )Fe, S2)pi' pi", 
for moments from (7.4), (7.6), (7.5) and (7.7) respectively. In order to perform 
the summations indicated in these last relations it is necessary to expand the 
factors multiplying the binomial coefficient in factorial powers of its lower 
index. That is, we must write 


a+b 
(7.13) n\ (ne + 1)” = >) Cyn, a, b)(m — b)”. 
i=0 


Again it is not possible to give a simple expression for the coefficients C,(n, a, b) 
in general, but for the first few moments they present no difficulty. For example 
from (7.9) 





A. M. MOOD 


n 


E(uru) = 2 m(n — m + 1) (" ~ a ) ptt pt 


n,=0 


> lim — i +1) + (n — 2m — t) + Mm — 1)7) 


m—t—1)\ a, as 
( m—t pi Ps 


= Dim -se (5; t) + (m= 29 - i=) 


n-i-—2 , a/{/n—-i—-3 M1 ms 
(22523) -@-s- n(n 
= [i(n — i +1) + (n — 21)(n —i—-1) pi. — (n —i-1) pilpi~n. 


We give below some means, variances and covariances which will be required 
later. 


E(ru) = pip.l(n — i — 1) + 2), 
E(su) = Pil(n — k)p: + 1], 
Onsriy = Pi 'pa{(n — t — fps + (n — i — j)po(l + 5~1) + 6pi 
— [(n — ¢ — 1)p2 + 2][(n — j — 1) + 2}, 
Oring = Pi Pal(m — 21) ps + (n — 2i)px(1 + Spr) + 6pi 
— [(n — i — 1)p2 + QF} + pipl(n — i — 1)p2 + 2, 
Origray = PiP{(m — i — j — 2) pip + 4(n — i —j — 1)pime +2 
— [(n —t — 1)p2 + 2i[(n — 7 — Ip + 2], 
Cum = Pi p{(n —t—k +1)° — An —i—k)p 
+ (n-—i—k—1)%pi — [(n — i — 1m + 2IN[(n — k)m + I}, 
Sry = Pi {(n — 2k + 1) — An — 2k)p, + (nm — 2k) pi 
— [(n — k)p2 + 17} + pil(n — k)p + 1, 
Couns = Pipe{(n — k — j — 2) pipe + 2(n — k — j — I)p—(l + m) 
+ 2(1 + pi) — pil(n — k)pe + U[(n — 7 — 1)p: + QI}. 
In order to obtain the distribution of runs in samples from a multinomial 
population, we multiply the distributions of Section 4 by 


k 
(7.16) P(n) = "| II pi. 
Corresponding to (4.1) and (4.2) then, we have 


& k 
(7.17) P(r;,:) = I] | F(r;) I Di‘ 


k k 
(7.18) P(r;,n) = I] e _ 2 F(r:) I pir. 


— 
















(7.19) 





(7.20) 
























(8.1) 













ances 









(8.2) 

























E(r;) 


Cre, = 


Or;r; 


8. Asymptotic 


where uj = 2; — T;. 


DISTRIBUTION THEORY OF RUNS 


In (7.17) r;; is the number of runs of length 7 of elements with probability p; . 
In (7.18) 7; is the total number of runs of elements with probability p;. As 
before, we shall investigate in detail only the distribution (7.18). 
of n; and r; follow at once from (7.8) and (4.5) 


e(II (nf? u)) = D I ing?(ng — 1)%)|” ie p 


nj —b 


The means, variances and covariances of the 7; are 


npi(l — pi) + pi, 
—npp (1 — 2p; — 2p; + 3pipi) — pip(2pi + 2p; — 


= npi(1 — 4p; + 6p; — 3pi) + pi(3 — 8p; + 5pi). 





The variables 


Ox;2; 
Oz;2; 
Ozzy 
Cxzz, 
Sy5y; 
Fyiu5 
Fxiy; 
oxy; 
Cz 

Oxns 

Cy;s 


Oss 


i 2 
$14 — NPi Pe 
oe 


ee 
Ui + Jn 


k 
sa _ Sik — NpPipe 
Un = & = ~—_— 
24 
Sai — NPi Pe2 . 
Ui = YY = ———, 1=1,--- 


Jn 
mM — npr 


vn’ 





Uk+h = 2 


pips — (21 + 1)pi‘ps + 2pi'" ps, 
—( +5 + V)pi’ps + 2pi" pi, 


sie —(¢+k+ 1)pit*p} + pit**} , 


Pip: — (2k + 1)pi‘p, 

—(i +j + 1)pip3*? + Qpips — 

pip, — (2i + 1)pips' + optet, 

—(i +j + 3)p3**p} 7+2 4. 2pi* pi - 

—(k + 5 + 2)pr*pi™ + pi pi(l + mr), 
‘pip: + pi*'po(l — 4p,), 


= (k + 1)pip: — pi(l + 7), 


It 


ipip: + pips""(1 — 4p,), 
Pipe - 





389 


The moments 


n¢ 
7 


5ppi) » 


distributions from binomial population. We turn our atten- 

tion first to the distribution (7.7) and state a theorem analogous to Theorem 2 of 

Section 5. 
THEOREM 1. 





are asymptotically normally distributed with zero means and variances and covari- 


390 A. M. MOOD 


We have taken s2,, and nz to be the dependent variables of (7.7). The method of 
proof of this theorem is the same as that of Theorem 1 in Section 5, and will be 
omitted. As consequences of the theorem we have 

Corouiary 1. The variable 


k+h 


Q = x o usu; 


is asymptotically distributed according to the x’-law with k + h degrees of freedom. 

CoRoOLLaRY 2. Any subset ui, , Ui, , +++, Ui, Of the variables (8.1) is asymptoti- 
cally normally distributed with zero means and variances and covariances || o;; 
and 


Il, 


Q= DV ou; ui, 
7k=1 
is asymptotically distributed according to the x’-law with m degrees of freedom. 
| o'* || is the inverse of || o:;«, || . 
Corotiary 3. If 8; = 81; + 82; represents the total number of runs of length i of 
both kinds of elements, and s,; the number of runs of length greater than k — 1, then 
the variables 


8; — n(pips + pip) 
Vn 
_ & — n(pipe + pips) 
Vn 
are asymptotically normally distributed with zero means and variances and co- 
variances 


(8.4) Oi; = Cz;2; + Oxy; + Czy; + Sysy; 


where the terms on the right of (8.4) are defined by (8.2). We have put h = k 
in Theorem 1 to obtain this result. 
Coro.uary 4. The variable 


y= 
(8.3) 
Lk 


k 
(8.5) Q — » o 252; 


where the x; are defined by (8.3) and || o* || is the inverse of (8.4), is asymptotically 
distributed according to the x’-law with k degrees of freedom. 

Coro.tuary 5. If r denotes the total number of runs of both kinds of elements, 
then 


(8.6) a . 
2V npipa(l — 3p: D2) 


is asymptotically normally distributed with zero mean and unit variance. This is 
the result obtained by Wishart and Hirshfeld [11]. 





DISTRIBUTION THEORY OF RUNS 391 


9. Asymptotic distributions from the multinomial population. In this 
section we assume k > 2 to avoid degenerate distributions. Because of the 
function F(r:;) in (7.18) we do not investigate this distribution directly, but 
derive a more general asymptotic distribution as was done in Section6. We 
consider the distribution 


9.1) ae H ((m:] r) 


corresponding to (6.9). This is derived from (7.19) in the same manner as 
(6.9) was from (4.5). As before, we have replaced the numbers n; — 1 in (7.19) 


by n;, an unessential change as far as the asymptotic theory is concerned. 
We recall that 


(9.2) y= Ni NE 


hence we need only show that the variables on the right are asymptotically 
normally distributed in order to have the same result for the r;. Corresponding 
‘ to Theorem 2 of Section 6, we state 

THEOREM 1. The variables 


Mi — NPs; 


us 
‘87 a/n 


hae 
(9.3) 


ny — NP; 
Jn 


are asymptotically normally distributed with zero means and variances and co- 
variances 


= 1, --- 


= —3p.ppp, once = —3pipMr, 
= —3pipp., oii; = Dipi(1 — 3p%), 
= ppl — 3ppi), asses = Pi(1 + 2p; — 39%), 
= —3pip; , Cin = —2pPM, 
= —2pip., i545 = pip(1 — 2pi), 
= 2pi(1 — pi), 01.5 = —DPiP;, 
9:6 = pi(l — pi). 
In these relations the symbols are defined by 


Oij,st = Trg sree » Cij,s = Cz ;2, Oi,5 = Gxyz; 


and different literal subscripts represent different numerical subscripts. These 
moments have been computed by means of the identity (6.12). The proof of 
the theorem is like that of Theorem 2 of Section 6 and will be omitted. We can 
now give the limiting form of the distribution of the r; in (7.18) as 











392 A. M. MOOD 






Corouuary 1. The variables 
— np(1 — p;) . 
(9.5) a= laa 60 ld 8B. 
/n 


are asymptotically normally distributed with zero means and variances and co. 
variances 


oi = pil — pi) — 3pi(1 — p,)’, 
—pip;(1 — 2p; — 2p; + 3p,p)). 


These limiting moments follow at once from equations (7.20). 
Corotuary 2. The variable 


(9.6) 


Oi; 


k 
Q = X o 252; 


where the x; are defined by (9.5) and || a” || is the inverse of (9.6), is asymptotically 
distributed according to the x’-law with k degrees of freedom. 
Coro.uary 3. If r = =r; denotes the total number of runs, then 


— r —n(1 — =pi) 
Vn 
ts asymptotically normally distributed with zero mean and variance 
o* = Spi + 2p; — 3(Zpi)”. 


The author would like to record here his gratitude to Professor S. S. Wilks 
who suggested the problem and under whose direction this paper was written. 


REFERENCES 


{1] Kart Pearson, The Chances of Death and Other Studies in Evolution, London, 1897, 
Vol. I, Chap. 2. 
(2] Kant Marse, Naturphilosophische Untersuchungen zur .Wahrscheinlichkeitslehre, 
Leipzig, 1899. 
[3] Kart Manse, Die Gleichfirmigkeit in der Welt, Miinchen, 1916. 
[4] Kart Manse, Mathematische Bemerkungen, Minchen, 1916. 
[5] Kant Marae, Grundfragen der angewandten Wahrscheinlichkeitsrechnung, Miinchen, 
1934. 
(6] H. Grtnsavum, Isolierte und reine Gruppen und die Marbe’sche Zahl “‘p’’, Wurzburg, 
1904. 
[7] H. Bruns, Wahrscheinlichkeitsrechnung und Kollektivmasslehre, Leipzig, 1906, p. 216. 
(8) L. v. Bortxrewicz, Die Iterationen, Berlin, 1917. 
(9] R. v. Miszs, Zeit. f. angew. Math. u. Mech., Vol. 1 (1921), p. 298. 
(10] E. Isine, Zeit. f. Phystk., Vol. 31 (1925), p. 253. 
{11] J. Wisnart anv H. O. Hirsure.p, London Math. Soc. Jour., Vol. 11 (1936), p. 227. 
{12] W. L. Stevens, Annals of Eugenics, Vol. 9 (1939), p. 10. 
(13] A. WaLp ann J. WotrowirTz, Annals of Math. Stat., Vol. 11 (1940). 
{14] J. A. Joszpu, Annals of Math. Siat., Vol. 10 (1939), p. 293. 


PRINCETON UNIVERSITY, 
Princeton, N. J. 








A GENERALIZATION OF THE LAW OF LARGE NUMBERS 
By Hitpa GEIRINGER 


It is well known that the law of large numbers can be established for dependent 
as well as for independent chance variables by using Tchebycheff’s inequality [1] 
and assuming that the variance of the sum of the variables tends towards 
infinity less rapidly than n’. 

In recent years v. Mises has introduced the notion of statistical functions [2] 
and has shown that, under certain assumptions the law of large numbers is still 
valid if, instead of the arithmetic mean of the n observations 2%, ---,Z, a 
statistical function of these observations is considered. For example in the very 
special case, where the n collectives which have been observed are identical 
k-valued arithmetic distributions with probabilities pi, --- , p. corresponding 
to the attributes c,, --- , c, and with observed relative frequencies m/n, --- , 
n/n one obtains the result: It is to be expected for every « > 0 with a probability 
P, converging towards one as n — ©, that | f(m/n, ---, m/n) — f(pi, ---, De)| 
< eunder very general conditions concerning the function f. 

In the present paper we shall generalize these new results so that they will 
apply also to collectives which are not independent. 


1. Lemma concerning alternatives. Let us consider the n-dimensional 
collective consisting of a sequence of n trials and let us assume that the n trials are 
alternatives, i.e. for each trial there are only two possible results which we 
denote by “success,” “failure,” by “occurrence,” “non-occurrence” or by 
“1,” “0.” The total result of the n trials is expressed by n numbers each equal 
toO0 or 1. Let v(x, 22, ---,2n) be the probability of obtaining the result 2 
at the first trial, z. at the second one, --- , Z, at the last one (7, = 0,1; » = 
1,---,n). In the same way we introduce vy(z, y) = >, v(z, y, 23, --- , tn) 


Zgo°* “oly 

and generally v,,(z, y) as the probability that the uth result equals z, the vth 

equals y, (u ¥ v), and finally let v,(z) = >. vw(z, y) be the probability that the 
yv 


uth result equals x. In particular let us write 


v,(1) = Pu; v,(1, 1) = Pw, (u, » io 1, ner »n; ae ¥ v) 


Pp, being the probability of success in the uth trial and p,, the probability of 
simultaneous success both in the uth and rth trials. 
The variance s*, of the sum (2; + --- + 2p) is easily found: 
393 





394 HILDA GEIRINGER 
s = Var (11 + «++ +2) = 2 Gt tee bon — Di — +++ — Dn) V(t1, +++, 24) 
= _. (21 — pr)? v(m, - ++, tn) Hoe 
+ 22. (x1 — pr) (a2 — po)v(tr, +++, tn) + +. 
» (a1 — pr)’ri(m1) +--+ +2 z (x1 — pr) (a2 — po)oie(a1 , 2) + +> 


= pi(l — pr) +--+ + pall — pa) + 2(pi2 — pipe) + +++ + 2(pn-t.n — Dain). 
Thus: 


(1) s, = Var (m1 + --- + 25) = > p(t — pr) +2 : (Pu — PuPr). 


The first sum on the right is Sn/4; the second one consists of N = }n(n — 1) 
terms, therefore we cannot be sure that it tends toward zero after division by n’. 
Putting p,. — pp, = a} we see immediately: 
(a) A necessary and sufficient condition for lim s,/n = 0 is 
n-o 


(2) lim 1/n? D0 af? = 0. 
nc pey=l 
Denoting by o; the variance of v,(x) and by r,, the correlation coefficient of 
Vy»(z, y) we have 
a = Pw — PuPr = Typo - 

We see that a{?’ takes values between —1/4 and +1/4 and our conditions (2) 
postulates that the sum of these positive and negative terms tends towards 
infinity less rapidly than n’. As to the meaning of the signs of these terms we 


. > ‘ . 
see that a term ak will be 2 0, according as p,,/p, : p,. This means: the 


fact that the »yth event has presented itself makes the occurrence of the sth 
event either more probable; or it is without influence on it; or it makes it less 
probable. And we see that s,/n tends toward zero, only if there is a certain 
“equalization” or “stabilization” of positive and negative mutual influence. 
If in particular for a pair of values y, v, 74, = +1, that is v,,(0, 1) = »,,(1, 0) = 0, 
the events must either both occur or both fail and p, = p,. If ru, = —1 we 
have v,,(0, 0) = v,(1, 1) = O the simultaneous occurrence is impossible and 
likewise the simultaneous failure, and p, + p, = 1. If we have p,, = 0 (case of 
mutually exclusive events) then p, + p, S 1. 


Since s, = 0 and > p(l — p,) = > o, < n/4 we conclude from (1) that 
v=1 y=] 


> a{?) => —n/8 and we obtain the following simple sufficient condition for the 
w= 


validity of (2): 





LAW OF LARGE NUMBERS 395 


(b) Let us denote by m, the number of all combinations yw, v(u S n;v S nj ¥ »), 
such that, however large n may be, a>? > ¢€, where € is a given positive number; 


i< in 
then — >. aS) converges toward zero if lim m,/n* = 0. 
n? pel no 


We have in fact 
3s x oS” < ma + (N — mae 


and dividing by n’ we find that = x aS” is enclosed between a and m,/n’ + 
_— - mn which both tend toward zero. Roughly speaking this condition implies 


that for “almost all’? combinations of indices yu, v, the ad converge toward 
“negative or vanishing correlation.” 


On the other hand the sum of all positive and negative terms in > Po 
perl 
cannot become less than —n/8. Therefore, if “almost all’ positive terms are 


supposed to tend towards zero it follows that also almost all negative terms 
tend toward zero. Thus we obtain the sufficient condition (c) which is neither 
more nor less general than (b): 


(c) The sum 3 tof ‘” tends towards zero as n —> ~, if “almost all” the indi- 
2 el 


vidual terms a}? = py — pup, tend toward zero. Or more exactly, the sum in 
question conde toward zero if | a‘? | < efor every ¢ and sufficiently large n with 
the exception of u, terms where lim u,/n’ = 0. That is “convergence towards 


independence” for almost all combinations y, v of indices. Let us, for example, 
assume that all the p, are ~ 0 and all the p,, = 0, then all the oi”) are certainly 
< 0 and (b) is fulfilled; but it is easily seen (3) that in this case p; + pe + - 
Pn <1. Therefore all the products p,p, (with the possible exception of a finite 
number) tend toward zero, and (c) holds as well. 


2. Statistical functions. Suppose n observations have given the results 
%1, %2,---,2%n-. Let us assume for the sake of simplicity that they are all 
bounded between two real numbers A and B. To each real x corresponds the 
number n S,(x) of observations with a result S$ zx. S,(x) is a monotone non- 
decreasing step function with n steps, each of height 1/n; however several steps 
may coincide at the same point. We have 


{1) S.(z) =0 if ~<A and S,(r7) =1 if t2B. 


S,(z) is called by v. Mises the partition (Aufteilung) of the n observations. 
S,(z) coincides with the well known cumulative frequency distribution if the 
attributes c,(x = 1,--- k) and the corresponding relative frequencies 7;/n, - - - 
n/n are given. 





396 HILDA GEIRINGER 


A statistical function is a function of the 2; , 72, --- , Xn which depends only on 
S,(x), the partition of the n results. It will be denoted by f{S,(z)}. If the ¢, 
and the n,/n are given then statistical function means simply “function of the 
relative frequencies” and it becomes a function of k variables. In f{S,(x)} the 
partition S,(z) takes the place of the independent variable. Such a statistical 
function has the following properties: (a) It is a symmetric function of the 
41, %2,-°--,2%n. That is, it is independent of the succession of the n results, 
(b) It is “homogeneous” in the following sense: If instead of n observations 
we have nl observations and if at the same time each 2, is replaced by Iz, then 
the statistical function is not changed.’ Examples of statistical functions are 
the moments 


1 = _ r pal 0 
+d = fz dS,(z) = MM, 


or, if M} = a, the moments about the mean a: 


> >> (x, — a)’ = [ (@ — a)"dS,(2) = Mz, ete. 
NT yx] 
The independent variable in f{S,(z)} is a partition; but in addition we shall 
define f{P(x)} where P(z) is a certain bounded distribution which is not neces- 
sarily a partition. A distribution P(z) is called bounded if 


(1’) P(x) =0 if «<A and P(x) =1 if cZ2B. 


If this is true for a sequence P,(x), P2(x), --- with the same A and B then the 
sequence is called uniformly bounded. Let us now consider a bounded partition 
P(x) which in every point of continuity of P(z) is the limit as n — © of ase 
quence of bounded partitions S,(z). As S,(x) converges toward P(z), if 
f{{S,(x)} converges towards a limit L which does not depend on the limiting 
process S,(z) —> P(x) then that limit shall be denoted by f{P(x)}; it will be 
called the value of the statistical function at the “‘point” P(x) and f{S,(x)} will be 
called continuous at P(x). The definition of continuity can be given also in the 
following way: Corresponding to every ¢ > 0 exists an 7 > 0 such that 


(2) I f{Sn(z)} — f{P@} |<. 


for all values of n and for every bounded S,(x) such that at every point of 
continuity of P(z) 


(3) | Sa(z) — P(x) | S 2. 


In this case f{S,(x)} is called continuous at the point P(x). Thus a statistical 
function is defined for bounded partitions and for certain bounded distributions 
which are not themselves partitions. If the continuity defined by (2) and (3) 
exists for a sequence P;(x), P2(x), --- of bounded distributions with the same 1 


1 This condition of homogeneity is fulfilled e.g. for V rire --+ 2, but not for 2122 +++ In- 





LAW OF LARGE NUMBERS 397 


corresponding to a given e¢, we call the statistical function uniformly continuous 
at the points P(x), P2(z), ---. 


3. The general law of largenumbers. The generalization of the law of large 
numbers which we have in mind can be demonstrated in a way analogous to the 
demonstration given by v. Mises in the case of independent collectives if we 
introduce the results of paragraph 1 in order to estimate the variance. We shall 
consider here only one dimensional, bounded collectives in order to make clearer 
what is the essential of the generalization. 

A sequence of dependent collectives P;(x), P2(x), --- , Pa(x) can be given in 
the following manner. Let P(2:, 22, --- , Zn) be the probability that the result 
of the first observation is S 2, of the second S 2,---, of the nth < z,, 
This distribution will be said to be bounded in (A, B) if P = 1 when all the z, 
are 2 Band P = Oif at least one of these arguments is less than A. From this 
n-dimensional distribution we deduce n one dimensional distributions 


P,(z) = P(x, B,--- , B), 
P,(z) = P(B, z, B, ---, B), --- , Pa(z) = P(B, --- , B, z) 


where P,(z) is the probability that the vth observation be S z. The P,(z) are 
uniformly bounded in (A, B) which is a corisequence of P(x, 22, --- , 2n) having 
been assumed to be bounded in this interval. In an analogous way we deduce 
from P(x, %2, +--+ ,%n) the n(n — 1) uniformly bounded two dimensional 
distributions 
(2) Px(z, y) = P(z, y, B, seit B), P,3(2, y) = P(x, B, y, B, re B), i 
Here P,,(z, y) is the probability that the uth result is Sz, the »th result Sy, 
and we have P,,(z, y) = P,,(y,x). Of course we have also 
(1') P,(z) = Pu(z, B) = Pis(z, B) = --- = Pin(z, B) 

P2(z) = Pw(B, x) = Pes(z, B) = --- = Pon(z, B) ete. 
If we put in (2) z = y we obtain P,,(z, x) = P,,(z, x) and we introduce 
(3) P,,(z, x) = Py(x) = Pyw(z) 
the probability that both the wth and the vth observation is Sz. Then P,,(z) 


equals zero if x < A and equals one if x 2 B, and this is valid with the same A 
and B for all the distributions P,,(z). 

Now if p, P2, --- , Pn are the probabilities of success for n general alterna- 
tives Tchebycheff’s Lemma asserts that the probability W that the average 
(t+ 22+ --- + 2,)/n of n observations differs by more than 7 from its expecta- 
tion (pi + pe + --- + pn)/n is subject to the following inequality 


1 Mita te-» +an\_ 8h 
(4) WwW s a Ver (atat-. te) = rn 


Here s%, is given by (1) of paragraph 1. 


(1) 














398 HILDA GEIRINGER 





Let us introduce the average P,(x) of the P,(z): 
(5) P,(x) = [Pi(z) + Po(z) + --- + Pa(z)]/n 


and let Q, be the probability that at any point of continuity of P,(z) the in- 
equality 


(6) | S,(z) — P,(2) | > 


holds. Our aim will be to show that for every 7 under certain restrictions re- 

garding the given collectives, Q, tends toward zero as n tends toward infinity. 
For a fixed point x’ the probabilities P,(z) = p, and P,,(z) = py are constants 

and we put P,(z) = pa = (pi + Po + --- pn)/n. The probability that in 2’ 


(7) | S.(z’) - P Az’) | > /2 


is then, according to (4) smaller than (s),"/(3n)’n. Here we denote by (s;). 
the value of s* in 2’ (as given by (1) in paragraph 1). 

Now we divide the interval (A, B) in N parts in such a way that in every one 
of the N intervals e.g. in (z’, 2’’) the variation 


(8) 6 = P,(x"’) — P,(z’) S 7/2. 


If there is at x’ (or at x’) a step of P,(x) we take the limit which P,,(x) approaches 
as x — 2’ (or x’) from the interior of the interval. In order to obtain such a 
division we need only divide the total variation 1 of P,(x) in 2/n equal parts and 
project these points of division on P,(x), disposing however in a suitable way of 
horizontal parts of P,(z). The abscissae of these points form the endpoints 
of the N intervals. If there is a step of P,(x) at an endpoint of one of these 
intervals the variation in both the adjacent intervals can only be diminished. 
It is further possible that the two ends of an interval coincide z’ = 2’, this will 
be so if P,(z) has for x’ astep >n/2. In any case we have a division in N S 2/n 
intervals such that all the points of continuity of P,(x) are enclosed in them and 
in each of these intervals (8) is valid. 

Let us now assume that in the left end point 2’ of the rth interval (z’, x’’) the 
inequality 


(9) | S,(z’) aa P,(2’) | s n/2 
is valid. Then we have for every x between x’ and 2” 
(10) | Sa(z) — Pa(z)| S$ 0/2+6 87. 


Because, since S,(z) and P,(x) are both monotone, the difference S,(z’) — 
P,(z’) cannot increase by more than 5 S 7/2 as x varies from x’ to 2’. There- 
fore if (6) is valid for any point z in this interval then (7) must be valid for 
the left end point 2’ of this interval and the probability gq, of this latter inequality 
is less than or equal to 4(s%,)2/7'n. 

But there are N intervals with the left endpoints t, re yore zy and the 





















LAW OF LARGE NUMBERS 399 






probability that (6) may be valid in any point belonging to any one of these 
intervals is S gq: + g +--- + qv. Denoting by s% the greatest of the N 


variances (8%)z;, (8n)zj, ++, (8~)2 we have for Q, (which is the probability that 
(6) may be valid at any point of continuity of P(x)) the inequality 

4N 2 8 8. 
(11) Qe Sutat--- +d S 58s Soe 


Therefore Q, tends toward zero for every 7 if s,/n tends toward zero. 
But according to (2) in paragraph 1, s,/n tends toward zero if for every x in 
(A, B) 


(12) lim 5 x [Pu(z) — P,(x)P,(z)] = 0. 
no oom 

Considering the definition of continuity of a statistical function we have ob- 

tained the following result: 

As in (1’), (2), (3) and (5) let P,,(x, y) be two dimensional distributions (u,v = 
1,---,; # v), uniformly bounded in (A, B); P,»(x, B) = P,(x); Py(x, x) = 
P,,(z) and P(x) = 1/v(Pi(x) + Po(x) + --- + P,(z)). 

If the variable partition S,(x) is bounded in (A, B) and if f{S,(x)} is uni- 
formly continuous at the “points” P,(x), P2(x), --- then the probability that 


(13) | f{Sa(z)} — f{Pa(z)} | > « 


tends toward zero for every «as n — ©, provided (12) is uniformly valid for every 
z in (A, B). 












4. Examples. Let us illustrate by simple examples. 
1) In order to define the P,(x) etc. mentioned in our theorem we define the 
n-dimensional distribution P(2: , x2, --- Zn) used at the beginning of paragraph 
3 by indicating the probability density 

w(t, Ze,---, tn) = Cy[l — ate --- Zn] in the “unit cube”, 

at 
The corresponding probability distribution is 


(1) 


elsewhere. 















(2) Pts, aay veep te) = Po vee f” wleay tay +++ 20) din ++ dt. 


By putting 
5” 
(3) C.= |]? 
we see that P(x:, 22, --- , Zn) equals unity if all the arguments are = 1 and it 






equals zero if one of these arguments is less than 0. Therefore P(x, 22, --- 


, 
tx) is bounded in the unit cube. 


400 HILDA GEIRINGER 


From (1) we deduce the two-dimensional densities 


Vy»(z,y) = c4( on 2) in the unit square, 


(4) 
= 0 elsewhere 
and the distributions 


z vy 
(5) Pole, y) = [| rwle, y) dedy. 
We see that 


Pyy(z, y) = Cary (1 - #) in the unit square 


ifzory $0 
ifzandy 21 


and e.g. for z = 1,0 < y < 1 we have P,,(z, y) = P,w(1, y) etc. Thus the 
P,,(z, y) are completely given. 

It follows from (3) that —C,/2” = 1 — C, ; therefore putting C, = C we 
have in (0, 1) 


P,»(z, 2) = Py(xz) = Cx? + (1 — C)z* 


(6) 2 
P,(z) = Cz + (1 — C)z 


therefore 
(7) P(x) — P,(z)P,(z) = C(1 — C)z*(1 — 2)’ 
is < 0 for every z in (0, 1) since C > 1. Forz S 0, P,,(x) and P,(z) both 


equal zero and for z = 1 they both equal 1. Therefore our conditions of para- 
graph 1 are fulfilled. We see that C, tends towards unity as n > o, therefore 
for every zx in (0, 1) P,,(x) — P,(x)P.(z) tends towards zero, we have “conver- 
gence towards independence’”’ but by no means independence. 

This example was based on a symmetric density. Let us give an example of 
asymmetric and arithmetic distributions. For the sake of simplicity let P;(z), 
P(x), --- be arithmetic distributions each with only three steps at z = 0, 1 
and 2. As starting point we take the n-dimensional arithmetic distribution 
v(21, 22, --- tn) which gives the probability that the first result equals 2, the 
second 22, --- , the nth z,, the z, being equal to 0 or 1 or 2; thus v(z , 2, --:; 
Zn) takes 3” values the sum of which equals unity. We deduce the two dimen- 
sional distributions »v,,(z, y), e.g. v2(z, y) = iz. ve, Y, 23, +++, Zn), the prob- 


ability that the first result equals z, the amend y, and finally the (xz) = 
> v2(z, y), ete. According to the definitions of P,(z) and P,,(xz) we have then: 
y 





LAW OF LARGE NUMBERS 401 


= 0 (x < 0) 
v,(0) (0s 2 <1) 
v,(0) + »,(1) (1 S x < 2) 

=1 (2 S 2), 

= 0 (x < 0) 

= v4,(0, 0) (0S 2< 1) 

= v,,(00) + »,,(10) + v,,(01) + (11) (1 S$ 2 < 2) 

=] (2 S$ 2). 


Now we subject v(z, ---, n) to the following conditions: Every v(x, ---, 2n) 
equals zero if it contains either: at least two “zeros,” or: at least one “zero” 
and one “‘one,”’ or: at least two “ones.” All the other v-values are supposed 
to be different from zero. Then we have 

Yy(0, 0) = ry(1, 0) = v,(0, 1) = »,(1, 1) = 0 


therefore P,,(z) = 0 for xz < 2 and P,,(z) = 1 forz = 2. On the other hand 
v,(0) = v(2, 2, --- 2, 0, 2, --- 2) and v,(1) = v(2, 2, --- 2, 1, 2, --- 2) there- 
fore P,(x) ¥ 0 for 0 S x < 2 and we have thus for every finite n 


P,,(z) — P,(x)P(xz) = 0 forz < Oandz 2 2, 
<0 forO Sz < 2, 


Therefore the condition (b) of paragraph 1 is fulfilled and thus (12) paragraph 3 
holds. 

I hope to have the opportunity to discuss more general applications of this 
theorem later. 

A generalization of the strong law of large numbers may be given in a simi- 
lar way. 


REFERENCES 


(1) B. H. Camp, The Mathematical Part of Elementary Statistics, New York, 1934, page 256. 

[2] R. v. Misgs, ‘‘Die Gesetze der grofen Zahl fiir statistische Funktione,’’ Monatshefte 
fiir Math. u. Physik, 1936, p. 105-128. 

[3] H. Gerrincer, ‘‘Sur les variables aléatoires arbitrairement liées,’’ Revue de l’Union 
Interbalcanique, 1938, p. 6. 


Bryn Mawr CoLtece, 
Brrn Mawr, PENNSYLVANIA. 





CONDITIONS FOR UNIQUENESS IN THE PROBLEM OF MOMENTS 
By M. G. KENDALL 


It was shown by Stieltjes [1] that in some circumstances it is possible for two 
different frequency distributions to have the same set of moments. For in- 
stance, the integral 


| fe dz 


around a contour consisting of the positive z-axis, the infinite quadrant and 
the positive y-axis is seen to be zero and it follows that 


t 
| z"e* sin z'dx = 0. 
0 


Thus the frequency distribution 
0<r<S 2, 
0O<A <1 


(1) dF = 3e**(1 — sin 2) dx 


has moments which are independent of \, and equation (1) may be regarded as 
defining a whole family of distributions each of which has the same moments. 
It is easy to see that moments of all orders exist, and in fact 


pu, (about the origin) = 3(4r + 3)!. 


A second example of the same kind, also due to Stieltjes, is the distribution 
1 


: 0<7<2, 
=—= x **{1 — dsin (27 log z)} dz 


2 
@) t/t 0<rA <1, 


for which 


Ly _ ett (rt2) 

The question naturally arises, what are the conditions under which a given 
set of moments determines a frequency distribution uniquely? The question 
is of great interest to mathematicians, being closely linked with problems in the 
theory of asymptotic series, continued fractions and quasi-analytic functions; 
and it also has importance for statisticians since there is sometimes occasion to 
be satisfied that a problem of finding a frequency distribution has been uniquely 
solved by the ascertainment of its moments or semi-invariants. Stieltjes him- 
self considered a more general problem: given a set of constants @, 

402 





PROBLEM OF MOMENTS , 408 


(1, +++ Cr, +++ does there exist a function F, non-decreasing and possessing an 
infinite number of points of increase, such that 


(3) [ Cee 


and under what conditions is F unique, except for an additive constant? 
Stieltjes showed that if we express the series 


(4) Z (-)°S 


r=0 


as a continued fraction of the form 


(5) ili ili 

G12 + a2 + a3z + a + Aan—1Z + Aan + 
it is a necessary and sufficient condition for the existence of at least one F that 
all the a’s be positive; and that the function is unique or not according as the 


series >, (a,) diverges or converges. (If the a’s are positive it must do one or 
r=0 


the other.) The integral of equation (3) is to be interpreted in the general 
Stieltjes sense, so that the result applies to discontinuous as well as to continuous 
distributions. This is also true of the results obtained below. 

Hamburger [2] discussed the similar problem when the limits of the integral 
in equation (3) are + ©, and showed that a function F exists if the expression 
of (4) as a continued fraction of the form 


bo by be aii 
Otz+a+t+z+a+2+ 


gives positive values of the b’s. In order that F may be unique it is necessary 
and sufficient that the continued fraction becompletely (vollstandig) convergent 
in the sense defined by Hamburger. 

Unfortunately these criteria, though mathematically complete, are not very 
useful to statisticians because as a rule it is too difficult to express the coefficients 
a and b explicitly enough in terms of the given c’s to enable questions of sign or 
of convergence to be decided. So far as I know, no more convenient criterion 
for the general Stieltjes problem has been found; but progress is possible if one 
considers the narrower question: given a set of moments, is the distribution 
which furnished them unique, that is to say, can any other distribution have 
furnished them? This is more limited than the Stieltjes problem because we 
know that at least one solution exists. 

Contributions to this subject have been made by Lévy [3] and Carleman [4]. 
Lévy shows that if moments of all orders exist and are positive it is a sufficient 
condition for them to determine a distribution uniquely that pl "/n remains 
finite as n tends to infinity. (Here and elsewhere in this paper yu, refers to the 
moment of order r about any point, not necessarily the mean.) Carleman shows 





404 M. G. KENDALL 


that, for the case of limits — © to + © the moments determine the distribution 
uniquely if 
= 1 
r=0 (pay)! 2) 
diverges. For the limits 0 to ~ he gives the corresponding series 
oo 1 
& (uy) en 


a criterion which can be improved upon, as will be shown below. 

The purpose of this paper is to develop criteria of this kind more systematically 
and to give more general criteria suitable in cases where the moments are not 
known explicitly but the behavior of the frequency distribution at its terminals 
is known. 

Three preliminary points necessary for the later argument may be noted. 

(1) Define the absolute moment of order r by 


— [ | 2" |aF 
and recall that 
nsucsud-.. <<... 
(cf. Hardy and others, [5]). In other words the quantities »}/” form an increas- 
ing positive sequence and their reciprocals a decreasing positive sequence. 


(2) The quantity »\/"/n must either tend to a limit or diverge to infinity as 
n— ©. For suppose that 


lim v'"/n = k, 
lim yi" /n = 1. 

Writing temporarily y/™ = a,, we have that, given ¢ there is an N such that 
a,/n>k—e 

for an infinity of values of n greater than N. Similarly there is an M such that 
a,/n<l+e 


for an infinity of values of n greater than M. Now choose p such that a, , dp: 
are two consecutive values, one near the upper limit and one near the lower 
limit. This can always be done and we can take p as large as we please. We 
then have 


a, > p(k — e) 
Mi < (op $1)1+ © 













PROBLEM OF MOMENTS 


and hence, since a,4,; > a, 





(k— ep <(o+1)1+ © 
giving 


ao~-9<14+a4%. 
p p 





Thus k — 2 can be made as small as we please and is thus zero. 

The argument can be very simply adapted to the case in which k is infinite, 
and if l is not finite k, being not less than J, is infinite. Thus as n — © either 
lim a,/n exists or a,/n — «. 

(3) If any moment fails to converge, so will all moments of higher order. It 
is evident that more than one distribution can exist having a limited number 
of finite moments given and the remainder infinite. Thus we need only consider 
the case when moments of all orders exist. Furthermore, if any even moment 









exists the absolute moment of next lowest order must exist; for if [ x” dF 





0 C) 
exists, then each of [ a" dF and | a" dF exist separately, each being positive. 
r) 0 





0 © © 
Hence [ a” "dF and [ x" dF exist separately and thus [ |2°"" |dF = 
rd 0 ~ 





0 oy 
- [ a” dF + [ x” dF exists. Hence we need only consider the case in 
) 0 


which absolute moments of all orders exist. 
THEOREM 1. A set of moments determines a distribution uniquely if the series 







V. 
> - converges for some real non-zero t. 
ra 7s 





Consider the characteristic function 


o) = [ “ar. 





This is uniformly continuous in ¢, and so are its derivatives of all orders. Thus 
we have, in the neighborhood of t = 0 the Maclaurin expansion 


s@ = D8 wh +R 


«oi =o 


r=0 














‘ This proof is necessary to the use of limits in the following theorems, but Theorems 2 
and 3 are equally valid if lim is substituted for lim therein. It is not generally true that 


if a, and b, are increasing monotonic sequences either lim a,/b, exists or Gn/b, > © as 
nr— o, 












406 M. G. KENDALL 


Consequently, under the condition of the theorem, which implies that = o Lr 
r 
is absolutely convergent for some radius p, ¢(t) has a Taylor expansion in the 


neighborhood of the origin and is thus uniquely determined by the moments for 
t < p. Furthermore, in the neighborhood of t = t& we have 


o => (feo [ xe ar} + R. 


r=0 r 


( — is not greater than »,. Therefore ¢(i) 
can be expanded in the neighborhood of ¢ = & in a Taylor series with a radius 
of convergence at least equal to p. Hence the function defining ¢(¢) in the 
neighborhood of the origin can be continued analytically throughout the range 
—« to +o and ¢(t) is uniquely determined in that range. 
But the characteristic function unqiuely determines the distribution; and 
hence the theorem follows. 
As a result of Theorem 1 we have the following generalization of the criterion 
given by Lévy. 
THEOREM 2. A set of moments completely determines a distribution if lim yin /n 
ts finite. 
It has already been seen that unless y/™ /m becomes infinite the limit exists. 
vt 
r! 


The modulus of the coefficient of 


By the Cauchy test for convergence the series = converges if 


(7) 


As n—o, (n!)"" tends, in accordance with Stirling’s theorem, to 
(s/2xn e~"n")"” ie. to n/e. Consequently the condition (7) becomes 
lim [v/"/n] et < 1. 


Thus if lim »\/"/n = k, say, the inequality (7) is satisfied for t < 1/(ek) and the 
theorem follows. 

An important corollary, which enables us to disregard the absolute moments 
(which may not be given if part of the range is negative) is 

THEOREM 3. A_ set of moments uniquely determines a distribution if 
lim p3,?” /n is finite. 


n~-?c 


1/(2n—1 1/(2 1/ (2 
For sco ly < Mn = Man *, 


‘ 1 oe . 1 n 
Thus, lim — inl » < lim ea ane 


< lin 2 — 
2n 





PROBLEM OF MOMENTS 407 


and is therefore finite if the limit on the right is finite. Thus lim yi!” /n, which 


cannot be greater than the greater of the two limits of ee /(2n — 1) and 
yi/@”) /(2n), must be finite; and the theorem follows from Theorem 2. 


| 
Now consider the series 7 —7r- Since the successive terms form a monotonic 


r=0 Vr 
sequence it is a sufficient as well as a necessary condition for convergence that 
n/v" tend to zero. Thus, if the series is divergent n/v\/” cannot tend to zero 
and so v\/"/n cannot become infinite. Hence it must tend to a finite limit,which 
may in particular be zero. Hence from Theorem 3 we get 


THEOREM 4. A frequency distribution is uniquely determined by its moments if 


1 
bm Tr diverges. 


r=0 Vr 

Since 1/v}/” is a decreasing sequence the series = 1/v)/” converges or diverges 
with 21 Jif, The Carleman criterion, given by him for the case of limits 
+,follows. For the case of limits 0 to © the absolute moments are the same 
as the moments and the criterion can be the divergence of either = 1/y)” or 
>1/ —". Since u, is greater than unity in the type of case under consideration 
the former series provides a more stringent test than that given by Carleman. 

At first sight it is rather surprising that the uniqueness of the distribution 
depends only on the behavior of the even moments, particularly when, by a 
simple extension of the above result, it is seen that a sufficient condition for 
uniqueness is the divergence of 2 1 wi!” or > 1/pS™ or any infinite subset 
chosen from the moments. It will, however, be remembered that the odd 
moments are conditioned to some extent by the even moments, and that unique- 
ness is really determined by the limiting form of », as n tends to infinity. 

It is evident that other tests may be derived from Theorem 1 by using the 
various tests for the convergence of an infinite series. For instance it is a suffi- 
cient condition for a set of moments to determine uniquely a distribution with 
positive range that 


Mn Mn+1 - 1+2%+0(4), chen * 1 
n n 


ni/ (n+ 1)! B>0 


i.e. that 
(8) m =1+1+0(4), y>0. 
It may be noted in passing that the distribution 
dF = e* dz 
for which 
ur (about origin) = r! 


is completely determined by its moments. In fact, by direct reference to 
Theorem 1 we see that the series = (it)’ converges for t < 1. 





408 M. G. KENDALL 


A frequency distribution of finite range is uniquely determined by its moments. 
For if the range is 0 to A we have 


A 
te = I ag dF < A 
and hence 1/y}’” > 1/A so that the series = 1/}” is divergent. 

A proof for the case when the frequency distribution is continuous has been 
given by Lévy, though on entirely different lines from the above. 

THEOREM 5. A frequency distribution of infinite range is uniquely determined 
by its moments if it tends to zero at the infinite terminals faster than e ~*. 

Consider first of all the case when only one end of the range is infinite, so 
that we may take the range to be 0 to o. 

If (un/n!)" has a finite limit the distribution is unique, by Theorem 2. We 
have then only to consider the cases (if any) in which (u,/n!)"” tends to infinity. 
It will be shown that in fact such cases do not occur. 

Given any (small) e there exists an X such that 

fi) <«¢ z>xX 


ce? 


where f(z) is the distribution. Thuis 


(9) [ s@z"az <e [ ex" dz < enl. 


This is true for all nm and X is independent of n. Now, 
© x © 
| f(a)z" dz = I f(x)x" dx + I f(x)xz” dz. 
0 0 x 


The first integral on the right is not greater than X”. The integral on the left 
tends, for large n, to something of greater order than n!, by our hypothesis, and 
hence to something of greater order than n”. This is of greater order than X” 
(since X, however large, is independent of n) and consequently the second in- 
tegral on the right is also of greater order than n!. But this is contrary to 
equation (9). 

The case for the range which is infinite in both directions may be dealt with 
similarly. 

It is easily seen that the two examples of equations (1) and (2) do not tend 
to infinity faster than e~*. 

Except for the general result of Stieltjes, all the above criteria provide suff- 
cient conditions, but whether the condition of Theorem 1 is also necessary is 
not certain. An inquiry into the circumstances in which the moment-series 
of Theorem 1 does not converge throws some light on the question. 

It will be remembered that the characteristic function always exists and is 
uniformly continuous in ¢t. Since the moments of all orders are assumed to exist 
we always have 











PROBLEM OF MOMENTS 409 


d’ \r 
aw ld = . 
Fx |, (7)" Ue 
Thus, if ¢(¢) can be expanded in an infinite Taylor series that series must be 
z () ur. And if this series does not converge then ¢(t) cannot be expanded 
r! 


gs an infinite Taylor series. But it can always be expanded in the finite form 
with remainder 


¢(t) = > _* + R. 


Thus, when the series does not converge, ¢(t) can be expanded in powers of t 
only asymptotically. 

Now it is known that there exist an infinite number of functions which have 
a given set of coefficients in an asymptotic expansion; for instance, if y(t) has 
an asymptotic expansion in ¢ the functions y(t) + dt’ ‘ all have the same 
expansion. It is therefore hardly surprising that when the conditions of 
Theorem 1 break down there can be more than one frequency distribution with 
the same set of moments. 

But it does not follow from what has been said that there must be more 
than one frequency distribution. There must be more than one function, but 
those functions may not qualify as frequency distributions, e.g. they may be 
negative in part of the range. In the example just given ¢ '*‘ cannot be a 
characteristic function, for it does not obey the well-known condition that ¢(t) 
and ¢(—t) should be conjugate. 

However, the question is more of mathematical than of statistical interest 
since the criteria provided above are likely to be adequate for the distributions 
encountered in practice. For example they establish the uniqueness of the Pear- 
son curves (including the normal curve), the Poisson and the binomial. It 
would seem that distributions like those of equations (1) and (2) will appear 
only as statistical curiosities. 


REFERENCES 


{1] J. Stre.tsEs, Recherches sur les fractions continues, Oeuvres, Groningen, 1918. 

[2] H. Hampurcer, ‘Uber eine Erweiterung des Stietjesschen Momentproblems,’’ Math. 
Annalen, Vol. 81 (1920), p. 235, and Vol. 82 (1921), pp. 120 and 168. 

[3] P. Lévy, Calcul des Probabilités, 1925, Paris. 

[4] T. Carnteman, Les Fonctions Quasi-analytiques, 1925, Paris. 

[5] G. H. Harpy, J. E. Lirrtewoop and G. Pé ya, Inequalities, 1934, Cambridge, England. 


Lonpon, ENGLAND. 





ON SAMPLES FROM A NORMAL BIVARIATE POPULATION 
By C, T. Hsu 


1. Introduction. In a number of papers written during the last ten years, 
J. Neyman and E. S. Pearson’ have discussed certain general principles under- 
lying the choice of tests of statistical -hypotheses. The:’ have suggested that 
any formal treatment of the subject requires in the first place the specification 
of (7) the hypothesis to be tested, say Ho, (77) the admissible alternative hy- 
potheses. An appropriate test will then consist of a rule to be applied to ob- 
servational data, for rejecting Hy in such a way that (277) the risk of rejecting 
Hy when it is true is fixed at some desired value (e.g., 0.05 or 0.01), (iv) the risk 
of failing to reject Hp when some one of the admissible alternatives is true is 
kept as small as possible. With these general principles in mind, they have 
investigated how best the condition (iv) may be satisfied in different classes of 
problems. In many cases, though not in all, it has been found that the condi- 
tions are satisfied by the test obtained from the use of what has been termed 
the likelihood ratio, [9], [10], [14]. Once the problem has been specified, the 
test criterion is usually very easily found, although its sampling distribution, 
if Hy is true, often presents great difficulties. In the present paper, I propose 
to use this method to obtain appropriate tests for a number of hypotheses con- 
cerning two normally correlated variables. The investigation was suggested 
by a recent application of the method by W. A. Morgan [6] to a problem origin- 
ally discussed by D. J. Finney [8]. 


2. The hypotheses and the appropriate criteria. A sample of two variables 
2, and 22 is supposed to have been drawn at random from a normal bivariate 
population, with the distribution 


via) = *y eof + (zt) 
' 2roi02\/1 — p?e 2(1 — pix) 71 


= (# = ) (= - =) 4 (2 - £)} 
01 02 02 
where £1, &, 01, 62, and py are the population parameters. 


Morgan tested the hypothesis that the variances of the two variables are 
equal, i.e., 


(1) 


Hy: 01 = O2. 


1 See bibliography at the end of the paper. 
410 





NORMAL BIVARIATE POPULATION 411 


Other hypotheses that will be considered in the present paper are as follows: 


: Assuming o; = oe ; to test pis = po. 

: Assuming o; = o2; to test i = &. 

: To test simultaneously 01 = o2, piz = po. 

: To test simultaneously o; = o2, & = &. 

: Assuming o; = o2 and & = & ; to test pis = po. 
: Assuming o1 = o2, and py = pp ; to test & = &. 


Derivation of the criteria. Let x1; , 22; be the measurements of the two char- 
acters on the 7th individual of the sample, then the joint elementary probability 
law of the two sets of n observations E = (2, tn, - 
Zan) is 


1 n 
E ’ ) ’ = (_1—) 
P(E | £1, &2, 01, o2, pis) caw s 


- 1 P ti — hi\ 
xm { a0 — ob & ( a: ) 
2 
—— (= — s) (= - ) + (= ~ &) }. 
O1 ol o2 


It will be convenient to denote by A, B, C, D, the following conditions of the 
population from which the sample is supposed to be drawn. 


* » Vin , Tor, Taz, +++ y 


(A) that stated in equation (1). 
(B) that stated in the equation for H; , namely 

o1 = o2 = o(o being unspecified). 
(C) &: = & = ££ being unspecified). 
(D) pis = po. 


Neyman and Pearson’s method affords a simple rule for obtaining appropriate 
test criteria once two sets of conditions have been defined. These are 

(a) the conditions which can be assumed to be satisfied in any case, and 

(b) the conditions which are satisfied if the hypothesis to be tested is true. 

The conditions (a) define a class 2 of admissible populations, and the condi- 
tions (b) define a sub-class w of 2 to which the population must belong if the 
hypothesis tested be true. 

The maximum value of p(E| &, &, 01, o2, pz) when the parameters vary in 
such a way that the population sampled always belongs to Q, is called p(Q max.). 
The maximum value when the population is restricted to w is called p(w max.). 


The likelihood ratio for testing the hypothesis specifying the subset w has been 
defined to be 


_ p (w max.) 
(3) A= aay 





412 Cc. T. HSU 


It will be seen that 1 < A < 0. By referring \, or a monotonic function of }, 
to its sampling distribution when the hypothesis tested is true, we obtain a 
scale on which to assess our judgment of the truth of the hypothesis tested. 

For each of the hypotheses H; to H;, d of (3) can be found. However, we 
shall use a more convenient criterion. 


(4) L =)" 
which is a monotonic function of X. 
Thus the respective test criteria are found to be: 
For H;: 
(5) i= 4s} 83(1 ‘faith rie) 
(si + s3)°(1 — Ri) 


2712818 
where Ri = = <a ? is the estimate of piz when o; and o2 are assumed to be equal. 


81 8 
For Hi: 


— (1 = po) — Ri) 
(6) la = — 7 — Ry 


For H;: 


(41 — ie)” \ 
7 iL= fi _nengennnne Dy 
@) ? * si + 82 — 2ri2siS 


For H,: 
(8) _ 4(1 _ po) 8i82(1 — 


i 


r’) 
eG SEX 
(si + 83)°(1 — poRs)” : a 


For H;: 


(9) es 481 82(1 — ris) whee 
6 cng nnet nr, OR Ke 2. 
{si + 82 + H(& — %)°}(1 — Re) 
For He: 
_ (1 — po)(1 — R32) 
™ =O PR 
27128182 — 3(%, — Fe)” 
si + so + $(t: — a)” 
the ¢’s are assumed to be equal. 
For H;: 


(11) om b=1/%1 4 OT Po) - (1 + po) (21 — %,)° —#,\ 


2(si — 2por125182 + 52) 


where R, = is the estimate of pz: when both the o’s and 


The different hypotheses are also given in Table V, at the end of this paper, 












al. 


and 


per, 


NORMAL BIVARIATE POPULATION 413 
together with the conditions defining sets of 2 and w, and the appropriate likeli- 
hood criteria. 

To complete the solution we must find the distributions of L or some mono- 
tonic function of L in each case when the hypothesis tested is true, in order to 
assess the significance of an observed value of L. 

















3. The distributions of the criteria. In order to simplify the problem of 
finding the distributions of the criteria, consider the following transformation: 


tu = (Xi — Yi)/V2 
tex = (Xi + Yi)/V2. 


It is clear that in view of (1) X and Y will be two normally correlated variables. 
We shall denote this property by A’ corresponding to A. The conditions B’, 
C’, D’ corresponding to B, C, D respectively are as follows: 


(12) 











B’: pry = 0, 

: éx = 0, 

D’: oY = Yoox (when pry = 0) 
where 
(3) y= pe. 


Thus we have the equivalent hypotheses H;, Hz --- Hz corresponding to 
H,, He,--- Hz. The likelihood ratios L;, Lz --- L; may be determined in 
the same way as before, and, in view of the transformation (12), it will be 
seen that they are equal to L,, Le --- Lz respectively. 

The tests of the hypotheses H; , H:, H3 are now seen to be well known. 

The test of Hi : pxy = Ois the test for significance of a correlation coefficient, 
and the criterion L; becomes 


(14) 


Lh = a =l1-—- See. 









This test has been dealt with by Morgan [6] and Pitman [15], and has been 
referred to above. 

The test of Hz: o%/ox = Yo when pxy = 0 can be treated as an extension 
of Fisher’s z-test [5], since yo is specified. If we write 


_ Sy _1+Ri _ sit sit 2rvsiss 
Sr 1-R .si+ 8 — Qrwsiss 
the test criterion Le of (6) may be written 
in sana 

yo(1 + u/y0)*” 


It is well known that if H is true, then 


(15) 












(16) I, 


414 Cc. T. HSU 


. 1 u\ie-® u\~*-» 
(17) p(u) = yoB[a(n — 1), ¥(n — 1)] («) (1 ° *) 


and the test appropriate to Hs and therefore of He is the associated z-test (z = 
3 log u/yo) with degrees of freedom f; = fe = n — 1. It may be easily shown 
that the two values of u cutting off equal tail areas from the distribution p(u) 
will correspond to a single value of Le. 

The test of H3: x = 0 when pxy = O is in the form of “Student’s” ¢ test. 
If we write 


(18) x" al (41 yes 2)” 
si + 82 — 211258182 


it follows that the test criterion L3 of (12) may be written 


(19) t=1/(i+ 5), 


But it is well known that if x = 0, then 


a 1 f? —jn 
-_ Ww - Se ie - (1 * :) 


The 5% or 1% points of significance of t may be obtained from Fisher’s ¢-table 
. [5] with degrees of freedom f = n — 1. 

The tests of H, and H,. We infer from (14), (16) and (19) that Ly, is a fune- 
tion of rxy , Lea function of Sy and Sy; , and L; a function of X and Sx. Itis 
clear that if rxy is distributed independently of Sx and Sy, then L; and Ly are 
independent, i.e., 


(21) p(Ii , Le) = p(Li)p(L2) 


and that if rxy is distributed independently of X and Sx, then L; and L; are 
independent, i.e., 


(22) p(Li , L3) = p(Li)p(Ls). 


It is known that X, Y are independent of Sx , Sy, rxy ; and in addition that 
rxy is distributed independently of Sx, Sy if pxy = 0. Therefore, if H; is 
true, then the relations (21) and (22) hold. Hence, knowing p(L;) and p(Iz2), 
a very simple transformation and integration gives p(Z,4). Similarly, the dis- 
tribution of Ls may be readily derived from those of L; and L;. 

But from the distribution of rxy when pxy = 0, by transformation (14), the 
distribution of L; assuming H; true is found to be 


1 


4(n—4) -_ —4 
Bea, OG ~ . 


(23) p(Li) = 


If H is true, from (17), by transformation (16) we have 





NORMAL BIVARIATE POPULATION 


entnitiieasiaes 
B[3(n — 1), 4] 


Again, if H 3 is true, from (20), by transformation (19), we have 


1 
B[3(n — 1), 3] 


which is the same as the distribution of Lz. Therefore by comparing (21) and 
(22) we see that the distribution of L; when H 5 is true will be exactly the same 
as that of L, when H; is true. We shall therefore confine ourselves to the 
problem of obtaining the distribution of L, from those of L, and Lz. 

Now 


(24) p(L:) = Eprtg .. 27%. 


(25) p(Ls) = LY — Ly) 


1 
B[3(n — 2), 3]B[3(m — 1), 4] 


Applying the transformation 


(26) p(La, Le) = LY a — Ly) AL a — >. 


In = In Le 
Z=I1, 
and integrating with respect to Z from 0 to 1, we obtain 
(28) p(Ls) = 3(n — 2)Li°™, 0<<1. 


Thus we can construct the values of L, at the 5% and 1% levels for different 
values of n as given in Table I. 


(27) 


TABLE I 
5% and 1% values of Ls (or Ls) 


5% 


1357 
. 2509 
.3017 
. 3684 
-4249 
.4729 
5493 
.6307 


7616 
.8074 
.8541 
.9019 
-9505 


n 

5 

6 

7 

8 

9 

10 

12 

15 
20 -7169 

24 

30 

40 

60 

120 
oo 1.0000 





416 Cc. T. HSU 


The test of He. In the case of testing He(oy = Yoox), assuming pxy and 
px each to be zero, the likelihood estimate of ox becomes >X°/n or St + ¥?. 
The distribution of this quantity is the same as that of S% but with degrees of 
freedom n instead of n — 1. Therefore, by analogy with the previous result 
(17) used in testing H: , if we write 


(29) m _1+R8, 


then the likelihood criterion of Hg becomes 


4 
(80) hae 
Yo (1 + ..) 
Yo 
and 


a _ ™ 1 v 4(n—8) v —(n—4) 
~ # (> a" w) = J B[Kn — 1), Fn] (<) (1 + *) 


Hence the test appropriate to Hg is the associated z-test z = } log 2 f 2 " i 
0 
with f, = n — 1, fe = n. We can use the z-table as before. 
The test of Hz. Here we test whether xy = 0. It may be seen that L; is 


a function of X?/(S} + yoS%). Further, if we assume that pxy = 0 and also 
that oy = yoSx , then it will follow that 5(X — X)? and ~ =(Y — Y)’ are each 
0 


distributed independently as x’ox with n — 1 degrees of freedom; and hence 
their sum is distributed as x’ox with 2n — 2 degrees of freedom. Alsoif tx = 0 
(and H; is true) X will be distributed normally about zero with standard error 
ox/,/x- Hence we may write 


-” me 1/ t + i 3} 


where 


a sx — RP +E — Pe 
(33) t= / i a 


and is distributed in accordance with “Student’s” distribution with 2n — 2 
degrees of freedom, 


(34) p(tz) = 


1 (1 £ i 
Vin = 28h, ¥@n— Di eB) 


In terms of original variables 


2 2 Z Z0)* 

(1 + po)(Z1 — Ze) 
35 - ” Sl — iene + 
(35) 2(si — 2porie 81 82 + 82) 





NORMAL BIVARIATE POPULATION 417 


4. Comparison of the F,-test and R,-test with the r;.-test in cases where H, 
and H, are true respectively. It will be noted that in the preceding discussion 
we have been concerned with three different tests of the hypothesis that pi: 
has some specified value pp. When there is no information available regarding 
the means and standard deviations of x; and 2; , the test is based on the sampling 
distribution of the ordinary product-moment coefficient rz. If it may be as- 
sumed that o, = o2, then we have the estimate 


2rie 81 Se 
Ry — . 
81 + 82 


If besides 0, = o2, it may also be assumed that & = &, then we have the 
estimate 


a 2ri2 81 a = 4(Z, — 2)* 
si + 82 + 4(% — &)° 


From the point of view of testing hypotheses, all these criteria 72, Ri, Re 
follow from the application of the likelihood ratio method. It will be noted 
that if o1 = o2, either the riz or the R; test may be used. But, insofar as the 
likelihood principle is accepted, the latter should be regarded as the “better” 
test. Again, if 0. = o2 and £ = &, all three tests may be used, but that based 
on R;z will be the “best’’. A question of interest is to investigate just what is 
meant by the “‘better’’ or the ‘“‘best’”’ test. We may ask how far the improve- 
ments are sufficient to justify the use of the R; and R- tests in place of the more 
generally used riz test. One method of comparison is to examine what Neyman 
and Pearson [12] have termed the “‘power function”’ of the tests. 

For example, when testing the hypothesis that a parameter @ has the value 
% in the population sampled, the power of the test criterion T with regard to 
the alternative hypothesis that 6 = 61 > 6 is given by the expression 6(@,;) = 
P{T > T.|@ = 6:} where T( is the value of 7 at the level of significance a. 
This quantity 8(@) measures the chance that the test as specified will detect 
the fact that 6 = 6, i.e., the chance of rejecting the hypothesis when it is not 
true. A test whose power function is never less than that of any other test is 
termed the uniformly most powerful test. 

If the permissible alternative hypotheses to @ = 6) are both @ < @ and 6 > 6, 
then the power of the test T is given by the expression 


B() =1—p{Ta <T < To|h} 


2 


where 7”, and 7’. are the values of T at both ends of the distribution at the 
level of the significance a. When the test is such that the power function has 
& minimum value a at @ = 6, it is said to be unbiased. 

A test is termed biased if, for certain alternative hypotheses 0 ¥ 6 , the chance 


of rejecting the hypothesis 6 = 6 is less than the chance of rejecting this hy- 
pothesis when it is true. 








418 Cc. T. HSU 

In what follows it is proposed to compare the power functions of the tests 
based on rz, R,, and Re in order to obtain more complete evidence of the 
extent to which one is “‘better’”’ than the other. 

The distribution of Ri.’ We have obtained the distribution of n when H} and 
therefore Hz is true. We are now able to find the distribution of R, by apply- 
ing the transformation of (15). Thus the distribution of R,; in terms of pp is 


(1 — po) qa — Rp 
2” B[3(n — 1), 3(n — 1)] (1 — poh) 
The significance of R, may be assessed by the z-test, where we take 
z= ling & = Log Lt Rt _ lg Lm 
(37) 2 yo 2 


(36) P(Ri| po) = 


log 


1— R, Tt. -* 
= 2' — §, say 


with degrees of freedom f; = fe = n — 1. R.A. Fisher’s z-table may be used 
in this connection. F 
When pz = 0, the distribution simplifies to 


1 — . 4(n—8) 

Bn — D,a@ — a) 7 BY 
tc Maca 2 
Bim — 1), 


since 2°"* B[3(n — 1), 3(n — 1)] is equal to B[4(n — 1), 3] by duplication for- 
mula [16, p. 240]. 

The distribution (38) is similar in form to that of p(ri2|pi2 = 0) with n — 1 
degrees of freedom instead of n — 2. The significance levels of R; may then 
be obtained directly from the r-table [1] for the case pz = 0, entering with 
degrees of freedom n — 1. 

The distribution of R,. The distribution of R, may be obtained from that of 
v when Hg and therefore Hg is true. It is 


(1 a po)*”(1 a a (1 + R,)*** (1 _ R28 
2IB[H(n — 1), Fa ——— 


This agrees with the result first obtained by R. A. Fisher [4] by a different 
method. The significance of R2 may be assessed by the z-test, where we take 


(40) z= 5 log (2 /* - ) 


2 Since finding the distribution of R, (36), (38) and the relation between R; and z’ (37), 
my attention has been drawn to a recent paper by DeLury [2] in which the same results 
are obtained. Since my method of derivation is different from his, I have thought it 
worthwhile to retain it here. 


p(R,| pi = (0) = 
(38) 
(i a err 


(39) p(Re| pe = Po) = 





















NORMAL BIVARIATE POPULATION 419 


with degrees of freedom f; = n — 1, fe = n. The tables for use with the 2z-test 
may be used in this connection. 


When px = 0, the distribution is simplified to 


1 ” ad 
41 R willl ies Hn-3) (7 _ p\iln—2) 
(41) pl 2 | pre ) *-IB[k(n — 1), Fn (1 + Re) (1 — R,) 
which is simply a Pearson Type I curve. 
Power functions of Ri and R,. In order to find the power functions of R,; and 
R, with respect to alternative hypotheses H, to Hz, specifying px = p: < po, 
it will be convenient to consider the incomplete beta function distributions 


‘ 1 ahn® i(n—2) 
” re) = sie oe eos 
oh oe 4(n—2) 
” Pe) = eT Fat — 29 


u v ‘ 
where 7 = ail + whe and 22 = wll + oes” From the Tables of the In- 


complete Beta Function [13] we can find the values of 2; and 22 at the significance 
level a, i.e. 





(44) T.; [3(m — 1), 3a — 1)] = ’, 

(45) Te; [4(n — 1), 4n] = a’. 

The values of R;(a), and of R2(a), may then be calculated from the relations 
~~ —i +a + th 

4 R, = * a — FAT we 

” , uti l1—-atyn ’ 

(47) R, =? = TI tet rm 


ofl l—atynnm 

The power functions of R, and R, thus found may be given as follows: 
(48) 8'(p. | Ri) = P{Ri < Ri(a) | ps}, 
(49) B'(or | Re) = P{Re < Re(a) | pz}. 


In the same way, for any alternative hypothesis H; specifying px = p: > po, 
we can find the values of x; and 22 at the — level a’’, at the other end 
of the distribution, i.e. 


(50) — I [$(m — 1), 3(n — 1] =”, 
(51) 1 — I; [}(n — 1), $n] = a”. 


Thence the corresponding values of R(a) and R: (a) may be obtained, and their 
power functions are 


(52) B'(p:| Ri) = P{R: > Ri (a) | ps}, 





420 Cc. T. HSU 


(53) B''(oe | R2) = P{R2 > Rez (qa) | px}. 


The power functions of R, and R2' with respect to alternative hypotheses speci- 
fying pis = pt < po and > pp may now be obtained by adding (48) and (52) or 
(49) and (53) or, more simply, 


(54) B(o. | Ri) = 1 — P{Ri(a) < Ri < Ri(a) | pe}, 
(55) B(or| Re) = 1 — P{R2(a) < Re < Ro(a) | pe} 


where Ri(a), Ri (a); R2(a), R: (a) are the values of R, and R at the two ends 
of the distribution at the significance level a = a’ + a”. 

In view of the fact that after transformation the tests based on R, and R, 
are equivalent to tests regarding the equality of variances, it follows from Ney- 
man and Pearson’s work [11] regarding the uniformly most powerful test of the 
hypothesis that o7/ox = yo, with alternatives oy/ox = ye < Yo (ory: > 0); 
that: (1) if o, = o2 and alternative to pi2 = oo are that pic = pt < po (Or, ina 
second case, p: > po) the test based on R; is the uniformly most powerful test, 
i.e., it is more powerful than that based on ry ; and (2) if 0, = o2 and & = &, 
then the test based on R, is the uniformly most powerful test, i.e., it is more 
powerful than those based on either ry or R,. 

For illustration, let us take a special case, say 


(a) n=10, m=0.6, a =a" = 0.025. 
From the tables, we obtain the values < 

a1 = .198902 = .184863 

zy = .801098 .772916 


and by calculation the values 
Ri(a) = —.0034 R3(a) = —.0487 
Ri(a) = .8831 Re(a) = .8632. 


The values of the power functions of R; and Re for specified values of p; have 
been calculated and are given in Table II. For p: < po, a comparison of 
columns 2 and 4 will show that the test based on Rz is uniformly more powerful 
than that based on R, (or for p: > po, a comparison of columns 3 and 5). 

The unbiased test of H, and’ Hy. When however the alternatives are that 
Piz = pt: < po, and p; > po, questions of bias may be introduced. 

In the case of H2, i.e. when R; is used, it was established by J. Neyman in 
his lecture courses [8], that if we test whether o+/o% = Yo , where the alternatives 
are y; < yo and y; > yo, and if the samples of X and Y are of equal size, then 
the test based on cutting off equal tail areas of the distribution of 2; is unbiased 
and of the type B [7]. Therefore the same may be said of the R;-test. 

In the case of H, , the equivalent transformed test is again whether o? /o% = 
yo. But the test now corresponds to that in which an estimate of o% is based 





NORMAL BIVARIATE POPULATION 421 


on f; = n — 1 degrees of freedom and an estimate of oy on fe = n degrees of 
freedom. The degrees of freedom not being equal, it is known that if equal 
tail areas are cut off from the sampling distribution of z,, this test will be 
biased. Neyman’s result [8] shows that if the lower and upper significance 
levels are taken at x2 and x2, then the equation 


(56) xy(1 — ay)? = 21 — 2) 

should be satisfied if the test is unbiased. Since in the present case, with the 
test based on equal tail area critical region, the bias will be very small, the 
rejection levels R2(a) and R2 (a) in the numerical investigation given in Table 
III have been selected taking equal tail areas for simplicity. 


TABLE II 
Values of the power functions of R; and Rz with respect to alternative hypotheses 
pis = pe < po Or pt > po 
(n = 10; po = 0.6; a’ = a” = 0.025) 


B(olR:) |  B"(olRs) 8'(ox\Rs) 8" (p:|R2) 


- 9984 

.9739 - 9807 

- 9867 - 9005 

-7189 - 7360 

-4960 é 5093 .0001 
2744 ‘ . 2809 -0006 
- 1825 ‘ . 1860 -0015 
1106 é 1111 .0037 
.0576 ‘ .0580 .0093 
-025 ‘ .025 .025 

.0081 -0678 -0080 .0720 
-0015 .1995 .0015 .2150 
.0001 -5950 -0001 . 6289 
5 .8979 .9150 
75 - 9866 .9897 


> 
= 


0.8 
0.6 
0.4 
0.2 
0.0 
0.2 
0.3 
0.4 
0.5 
0.6 
0.7 
0.8 
0.9 
0.9 
0.9 


If we now take a special case, similar to (a) above, but taking equal tail areas, 
so that 
n= 10 p = 0.6 
a= 0.5 (a’ = a’ = fa) 
we can obtain the values of z’s and of R’s as before. 
The values of the power functions of R; and R; for specified values of p; are 
given in columns 3 and 4 of Table III. These values are equivalent to the 


sums of the corresponding values in Table II. The values of the power func- 
tions of R, and R; for the following additional cases are also given in Table III: 











422 Cc. T. HSU 
(b) n= 10 po = 0.8 a = 0.05 
(c) n 
(d) n=20 m=08 a 


20 po = 0.6 a = 0.05 


0.05. 


Comparison of the power functions. We may now deal with the question 
raised at the beginning of this section, namely, as to what is meant by the 
“better” or “best”’ test. We shall proceed to compare for certain special cases 
the power functions of the three test, all of which are applicable where it may 
be assumed that o, = o2, & = &. 

In the first place it will be noted that the power function of the test based on 
equal tail areas of the r 2 distribution is 


(57) B(pe| 72) = 1 — P{yi2(a) <n < v12(@) | pe} 


(58) 
Pi{ry > r12(a) | po} =F 7 P(r | pe = po) drx = 3a 


and 


ee _ (1 — pt 2 pn a y" cos (—po Tz) 
(59) p(ri2| pis = po) aT[hin — 1] (1 — riz) Vd = airs) * 
The probability that riz is less than some specified value may be obtained from 
Tables of the Correlation Coefficient (F. N. David, [1]), or, where these are not 
sufficiently detailed, by using R. A. Fisher’s z’-transformation for 72 [4]. 

The cases considered are (a), (b), (c), (d) as defined above. The power 
functions of the three different tests (all based upon the equal tail areas of their 
distributions) are given in Table III. The figures for rz in the brackets are 
those obtained by the z’-transformation approximation. 

‘An examination of Tables II and III brings out the following points: 

(1) For reasons given above, the Rez test based on equal tail area critical 
regions is very slightly biased; the amount of this bias for the case n = 10, 
po = 0.6, a = 0.05 is shown in Table IV. This shows that the power of the Rs 
test is less than 0.05 in the fifth or sixth decimal places for 0.59 < p,; < 0.60. 
As a result this test is very slightly less powerful than the other two tests for 
alternatives with p; slightly less than p). The effect is, however, of little im- 
portance. 

(2) Except in this short range of p;, we find that 


B(pt| Re) > B(pe| Ri) > B(e} riz). 


Ory 


where 
j - 5 _(@) 
P{ri < ri2(a) | po} = [. P(r | pw = po) dry = da 











TE88° 
¥E00° 


8668" 
6£00° — 


oE98" 
L8¢0° — 


PLS6° 
POOF’ 


00€8" €9F6° 


6822" 


108° 
£922" 


6226" 
T2Z9¢° 


8¢16" 
€19¢° 


1016" 


6SFS" S[9Aa'T 


¢16°0 
¢6°0 


(2Z86') 
(6028° ) 
(69¢¢") 


L686" 
OST6° 


(FF66") 
(L¥¢8") 
(02Z8" ) 


L¥66° 


TZ8¢° 6268" 


(689% ) 
(FSFT*) 


(4266°) 
(F006") 


866° | ¥266° 


Z 
° 
_— 
& 
< 
J 
Pp 
A 
° 
¥ 
ica) 
& 
< 
_ 
oo 
< 
> 
—_ 
a 
4 
< 
= 
[+=] 
° 
Z 


£6FE" 
00S0° 
LI9T" 
T1OF- 
CEP9" 
OLT8° 
£996" 


(Falo)e | (ale) 


GLEE" 
000° 
68ST" 
0268" 
609° 
2908" 
4696" 


(4) 0) g 


soO=% OZ=u 


sasayjodhy aaynusayo 0} yoadsar yyun ss %ay pun ‘ tay ‘tty fo suoyounf samod ay) fo uostupdwog) 


1816" 
vElP’ 
LPIl’ 
00S0° 
2260" 
1902" 
vESs" 
6FF8" 
8696" 


(*a|'9)9 


FE06° 
988° 
9601" 
00¢0° 
¢160° 
9202" 
GlPg" 
8ZE8" 
8F96" 


9016" 
OL0r" 
61IT" 
00S0° 
L160° 
9€0Z° 
9Sts" 
69€8° 
£996" 


(‘ale | (*4/9%)9 


90=% OZ=4 


(Fal'?)¢ 


(‘a}’?)9g 


99FT" 
0090" 
6960° 
[002° 
OEE" 
LOLV’ 
SoTL" 
or L6" 
LGS6° 


(*ul'd)g 


so=% O1=4 


Ill ATAVL 


0629" 
G9IZ" 
0080° 
00¢0° 
£290" 
SrIT" 
G18Z" 
¥60S° 
09€2° 
[006° 


(al’'?)¢ 


1¢6¢° 
0102" 
6¢20° 
00¢0° 
¢290° 
SFIT" 
CG1Z° 
C96F° 
6812" 
L988" 


(tal’9)9 


9¢9¢° 
0681" 
GEL0° 
000° 
6290° 
crit” 
€G12° 
096F° 
9812" 
¢988" 


———— 


(tu) d)g 


90=% Ol=4 





424 Cc. T. HSU 


That is to say, the power function of the R, test never lies below those of the 
R, and rz. tests, and that of the R, test never lies below that of the rz test. 

(3) The gain in sensitivity as measured by the chance that the test will 
detect that p: ~ po is, however, very small. Further, R; may only be used if 
it is known that o; = o2 and R; if it is known in addition that & = &. It will 
only be in rather special problems that the statistician can feel confident that 
such assumptions are justified. We will therefore probably prefer the test based 
on the ordinary product moment correlation coefficient 72 , since the slight loss 
in power will be felt to be outweighed by the gain in simplicity. It is, however, 
only after an objective comparison of the consequences of applying the three 
tests that a definite opinion on these points can be reached. 


TABLE IV 


Pe 


0.5 

0.590 
0.591 
0.592 
0.593 
0.594 
0.595 
0.596 
0.597 
0.598 
0.599 
0.6 


5. Summary. 


8’ (ox|Ra) 


.0580 
.0274235 
.0271778 
-0269359 
.0266934 
.0264515 
.0262096 
-0259677 
.0257257 
.0254838 
.0252419 
.025 


B’'(o:|Rs) 


.0093 
.0225806 
.0228190 
.0230578 
.0232976 
.0235337 
.0237798 
.0240222 
.0242651 
-0245107 
.0247540 
.025 


B(p.|R2) 


.0673 
-0500041 
.0499968 
.0499937 
-0499910 
-0499852 
.0499894 
-0499899 
-0499908 
.0499945 
.0499959 
.05 


Various hypotheses relating to a population of two normal 


correlated variates have been considered and the appropriate test criteria for 
each hypothesis have been derived by the likelihood ratio method. The dis- 
tributions of the likelihood ratio criteria or of monotonic functions of them have 
been obtained with the aid of transformation (14). References have been given 
to tables from which significance levels for use in conjunction with the tests 
may be obtained; a new table of significance levels for the tests of H, and Hy 
was given. 

The power functions of rz , R; and Rz have been compared; from these power 
functions it was concluded that R; and Rz are suitable respectively for testing 
the hypothesis when o; = o2 and when, in addition, & = &. 

In conclusion, I should like. to express my indebtedness to Professor E. S. 
Pearson for continued advice and help in the preparation of this paper, to Dr. 
A. Wald and Professor S. S. Wilks for valuable suggestions. 





Z 
° 
_ 
& 
< 
= 
Pp 
o 
oo 
< 
_ 
ox} 
< 
> 
— 
= 
o 
< 
= 
o 
°o 
a 


(3s + sststu00g — 18)z \ / 
{ “e-pneri? ty/! 


2(*247°0 — T) 
Gy — 1)@? — 1) 


Ga — DI,Ce— HE t+ # + 8} 
(4 — 188 ish 


g('a°d — 1) GF + 38) 
(a4 - 1) (09 om 1) gs isp 


as tg UZ, om és + i 
ee (F — 1Z) * b/1 
("yd — T) 

,Ga — DG? - D 


{isis ttup we 238 + ts) } 
(4 “a 1s ist 


3 


’ 
u AX = "7 
BIO} 


(9) 


e— wet e+ te + 18 


x — Iz)e — telgttuz a "a; tg Ig tty =v I 


0d sid 


(Aq1peur 
(q) Zug (vp) Zuiug pezs9} 9q OF “ION Wodj y1edy) gase 
-ep SUOIZIPUOD | -ep sUOIZIPUOH suoijduinssy [8iytuy |-qyodAPy 

(S) 2) (€) (2) (1) 


1H sasayjodhy ay) bursa) sof anisdosddn viiayt9 pooyyayy ay) YUN 13y7060j ™ pun y buruifop suorytpuog 


A @TaviL 





426 Cc. T. HSU 


REFERENCES 


{1] F. N. Davin, Tables of the Correlation Coefficient, London: Biometrika Office, 1938, 
[2] D. B. DeLury, Ann. Math. Stat. Vol. 9 (1938) p. 149. 
[3] D. J. Finney, Biometrika, Vol. 30 (1938), p. 190. 
[4] R. A. Fisner, Metron, Vol. 1 (1921), p. 3. 
(5) R. A. Fisner, Statistical Methods for Research Workers, 7th ed. Edinburgh: Oliver 
Boyd, 1938. 
[6] W. A. Morean, Biometrika, Vol. 31 (1939), p. 13. 
[7] J. Neyman, Bull. Soc. Math. France, Vol. 63 (1935), p. 246. 
[8] J. Neyman, Lectures delivered in London, 1937-8, (Unpublished). 
(9] J. NeymMan Anp E. S. Pearson, Biometrika, Vol. 20A (1928), p. 175. 
{10} J. NevMan anp E. S. Pearson, Bull. Acad. Polonaise Sci. Lettres, A (1931), p. 460. 
[11] J. NeyMAn AND E. S. Pearson, Phil. Trans., Ser. A, Vol. 236 (1933), p. 289. 
{12] J. NeyMan Anp E. S. Pearson, Stat. Res. Mem. Vol. 1 (1936), p. 1. 
{13] K. Pearson (Editor), Tables of the Incomplete Beta-Function, London: Biometrika 
Office. 
[14] E. S. Pearson anp J. Nerman, Bull. Acad. Polonaise Sci. Lettres, A (1930), p. 73. 
(15] J. C. Prrman, Biometrika, Vol. 31 (1939), p. 9. 
(16] E. T. WarrraKer anv G. N. Watson, Modern Analysis, 4th Edition (1927). 


UnNIvERSITY COLLEGE, 
LonDoN, ENGLAND. 





er 


ka 








ON A LEAST SQUARES ADJUSTMENT OF A SAMPLED FREQUENCY 
TABLE WHEN THE EXPECTED MARGINAL TOTALS ARE KNOWN 


By W. Epwarps DEMING AND FREDERICK F. STEPHAN 


1. Introduction. There are situations in sampling wherein the data fur- 
nished by the sample must be adjusted for consistency with data obtained from 
other sources or with deductions from established theory. For example, in the 
1940 census of population a problem of adjustment arises from the fact that 
although there will be a complete count of certain characters for the individuals 
in the population, considerations of efficiency will limit to a sample many of 
the cross-tabulations (joint distributions) of these characters. The tabulations 
of the sample will be used to estimate the results that would have been obtained 
from cross-tabulations of the entire population.’ The situation is shown in 
Fig. 1 in parallel tables for the universe and for the sample. For the universe 
the marginal totals N;. and N_; are known, but not the cell frequencies Nj; ; 
for the sample, however, tabulation gives both the cell frequencies n;; and the 
marginal totals n;. and n.; . 

In estimating any cell frequency of the universe, such as N;;, three possi- 
bilities present themselves; from the sample one may make an estimate from 
the ith row alone, another from the jth column alone, and still another from the 
over-all ratio n;;/n: specifically, the three estimates would be n;;N;./ni., 
nijN.;/n.;,andn:;;N/n. As aresult of sampling errors these will not be identical 
except by accident, and though any of them by itself may be considered ac- 
curate enough, still, if the whole r X s table of universe cell frequencies were so 
estimated, the marginal totals would not come out right. In this paper we 
present a rapid method of adjustment, which in effect combines all three of the 
estimates just mentioned, and at the same time enforces agreement with the 
marginal totals. The method is extended to varying degrees of cross-tabulation 
in three dimensions. 

In any problem of adjustment where the conditions are intricate it is neces- 
sary to have a method that is straight-forward and self-checking; this becomes 
imperative when we realize that in the three-dimensional Case VII of the 
problem now at hand (vide infra), any adjustment in one cell must be balanced 
by adjustments in at least seven others. The method of least squares is one 
possible procedure for effecting an adjustment and at the same time enforcing 
certain conditions among the marginal totals. It is essentially a scheme for 


1Examples will occur in the 1940 census publications. Further discussion of this prob- 
lem and of the sampling procedure is given by the authors in ‘“The sampling procedure 
of the 1940 population census,” Jour. Am. Slat. Assn., Vol. 35 (1940), pp. 615-630. 


427 





428 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


arriving at a set of calculated or adjusted observations that will satisfy the 
conditions of the problem, and at the same time minimize the sum of 
the weighted squares of the residuals, symbolized as 


(1) S = Tw(n. — m)* 


n- and m% being the calculated and observed numbers in a cell, and n. — m% the 
corresponding residual. It is the nature of the conditions imposed on the ad- 
justed values that distinguishes one type of problem from another. Least 
squares has the practical advantage of uniqueness, once the weights of the ob- 
servations have been assigned, and it possesses the theoretical dignity of giving 
one kind of “best” estimates under ideal conditions of sampling. For our 
present purpose we shall minimize sums of the form 


(2) S= =(m; = ni) /ns 


n; being the observed frequency in the 7th cell, and m; the calculated or adjusted 
frequency therein. The conditions among the m; will arise from the fact that 
the marginal totals, after adjustment, must agree with their expected values, 
namely, the deflated marginal totals of the universe (for example, m;. and m. ; as 
defined in eqs. (6) and (7)). 

By definition, weight and variance are inversely proportional, hence the 
principle of least squares is identical with the minimizing of chi-square. Here 
the variance in the ith cell is »;(1 — »;/n), where »; is the expected number in 
that cell, and n is the total number in the sample. Now if »; is sufficiently 
well approximated by n;, it follows that if no cell contains an appreciable 
fraction of the whole sample (a circumstance requiring a fair sized number of 
cells—perhaps 100), the variance may be taken as »; for every 7, and the mini- 
mized S can be used as chi-square. But regardless of the number of cells, if 
the n; be not too much different from one another, so that the factor 1 — »;/n 
may be treated as a constant, we still get the least squares solution by minimiz- 
ing S as defined in eq. (2). 


2. The two dimensional problem. Suppose that the data on two character- 
istics (e.g. age and highest grade of school completed) are obtained for each 
member of a universe of N individuals, and that tabulations of the data provide 
either (a) one set of marginal totals N;., Ne., --- , N-. ; or (b) in addition, the 
marginal totals N,N, -- .,N.,. The nature of the tabulations is presumed 
such that it is not feasible to count the numbers N;;; in the cells, as would be 
done if one character were crossed with the other. Suppose, however, that for 
a sample of n individuals selected in a random manner from the universe, the 
two characters are crossed with each other, so that we know not only all the 
s + r marginal totals n,,---, n,. of the sample but also the numbers ni; 
(¢ = 1,2,---,7r;j7 = 1, 2,---,.8). The problem is to estimate the unknown 
frequencies N;; in the cells of the universe. This will be done by finding the 
calculated or adjusted sample frequencies m;; and then inflating them by the 
inverse sampling ratio N/n. 





A LEAST SQUARES ADJUSTMENT 429 


For the least squares solution we seek those values of m;; that minimize” 
(3) S = X(mi; — nsj)"/ni; 


wherein the m,; are subjected to one of the following sets of conditions: 
Case I: One set of marginal totals known. Assume N,., Ne. , --- , Nr. to be 
known. Then we require : 


(4) De mis = m., a= 1, 2, ---,7r. 
2 


These r equations constitute r conditions on the adjusted m,; . 


UNIVERSE SAMPLE 


ny nee 


N; Ne + je an N nj Ns Nn 


N;; unknown ni; known 
Marginal totals N.; and N;, known Marginal totals n.; and n;. known 
N known n known 


Fig. 1. SHow1nG THE System OF NOTATION FOR THE CELL FREQUENCIES AND MARGINAL 
ToTALS OF THE UNIVERSE AND THE SAMPLE IN THE Two DIMENSIONAL PROBLEM 


Case II: Both sets of marginal totals known. Here the adjusted cell frequencies 
must satisfy not only condition (4) but also 
(5) De my = m5 j=1,2,---,8—1 


there being now a total of r + s — 1 conditions. In both cases, 
(6) m: = Ni.n/N, 
(7) mj = N.n/N. 


In other words, m;. and m.; are the deflated marginal totals, i.e., N;. and N.; 


divided by the actual sampling ratio N/n. The m;. and m.; are not independent, 
for 


* The sign a will denote summation over all possible cells, unless otherwise noted. 
> will denote summation over all values of i, and similarly for an inferior j or k. The 
i 
dot, as in n.;, will signify the result of summing the n;; over all values of 7 in the jth 
column. 











430 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


(8) NitNet--- +N.= Ni. + No +--- +Ny = N. 


It is for this reason that if ¢ runs through all r values in eq. (4), then 7 can run 
through only s — 1 in eq. (5). A similar equation also exists for the marginal 
totals of the sample, namely, 


(9) NMitnet---+n, =m +mt-:-- +m =n. 


Solution of the two dimensional Case I. Assuming that the adjusted values 
of the m;; have been found, let each take on a small variation 6m,; ; then the 
differentials of eqs. (3) and (4) show that 


(10) 45S = Zi (m,; i ni;)/nj} 8m; = 0 (one equation), 
(11) Dd 5m; = 0, i=1,2,-+-,r  (r equations). 
7 


Multiply now eq. (112) by the arbitrary Lagrange multiplier — d,;. , and add eqs. 
(10) and (11) to obtain 


(12) Z{(mi; — niz)/niz — A. }bmi; = 0. (one equation). 


By the usual argument, one may now set each brace equal to zero, recognizing 
that the r Lagrange multipliers are then no longer arbitrary but must satisfy 
the relation 


(13) mi = ni(l + Ax). 


The adjusted frequencies m;; can be computed at once as soon as the ),. are 
found. To evaluate them one may rewrite the conditions (4) using the right- 
hand member of (13) for m;;, obtaining 


(14) m:. = n.(1 + Ai.). 


Another way to arrive at this same relation is to sum each member of eq. (18) 
in the ith row. However obtained );. is now known, since m;, and 7. are 
known, and in fact eq. (13) now gives 


(15) Mis = Ni;(m;./N;.). 


The adjustment is thus a simple proportionate one by rows, the cells in any one 
row all being raised or lowered by the proportionate adjustment in the row total. 
Case I thus amounts to r independent one dimensional proportionate adjust- 
ments, one for each row, and any one or all may be carried out, as desired. 
This result can be obtained by a simpler approach but is presented in this way 
for consistency with later cases. 

The minimized sum of squares may be computed directly, or from the row 
totals by seeing that 


(16) S = Do Gm. — 10:.)’/n.. 


The term (m;. — n;.)’/n;, for the ith row may be considered separately, and 


ee 
EEE ee a eae . 












nd 


ae Oe ee 
——— TTT a . 


A LEAS! SQUARES ADJUSTMENT 431 


used as x’ with s — 1 degrees of freedom, or all rows may be combined into 
the minimized S as given in eq. (16), and used as x” with r(s — 1) degrees of 
freedom. 

Solution of the two dimensional Case II. In addition to eqs. (11) we now 
have also 


(17) Dd bm; = 0 j=1,2,---,s—1 


which comes by differentiating eqs. (5). By addition of eqs. (10), (11), and 
(17), after multiplying eq. (117) by —),;. and eq. (177) by —X.;, we obtain 


(18) Z{ (mi; — nsj)/niz — A. — A.;}5ms; = O. 
Equating each brace to zero, as before, we find that 
(19) mis = nij(1 + A. + A.;) 


wherein \., is to be counted 0. The adjustment is now no longer proportionate 
by rows, but involves every cell. 

To evaluate the Lagrange multipliers in eq. (19) we may sum the two members 
downward and across in Fig. 1 and obtain the r + s — 1 normal equations 


mds + Do mgd =m. —m,, T= 1,2, 60,7 
(20) : 
De ngs. +r dz =m; — My, J =1,2,---,8—1. 
These can be reduced for numerical computation. The top row solved for 
Xs. gives 
(21) ds. = (1/n.){ms. — Do msds} — 1 
2 


whereupon by substitution into the bottom row of eqs. (20) we arrive at the 
8 — 1 normal equations 











Aa A.2 see A. =1 

Nani NiN2 Ni Ni, s—1 Naim. 
nhi- —_- —_— eee _ enema = mir ceeninantiaaia 

7 Xu 1. 2X ms, 2X Ns, , Xu 1, 
Nie Nig Ny Ni,es—1 Ne™M;, 

no eee _ ——_— = Me2— 
7?) & X 1. , 2X 1. 
(22) ; : 


Ni,s—1 Ni,s—1 Ni,2—1™M,. 
t.ei- DQ, —_—_—— = mMs3i— 2, Pe 
i nN. i 4. 


0. 


Because of symmetry in the coefficients, those below the diagonal are not shown, 
indeed, in a systematic computation, they are not used. The 0 in the bottom 





432 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


row is appended for the computation of the minimized S, if desired. The 
number of Lagrange multipliers to be solved for directly is s — 1, and the 
remaining ones come by substitution into eq. (21), \., being counted 0. 

A simple procedure for calculating the coefficients in the normal equations 
(22) is to set up a preparatory table by dividing each n,; in the ith row by /n, ; 
also to write down m;./+/n,. for that row, for use on the right-hand side of the 
normal equations (compare Tables I and II). In machine calculation the con- 
stant divisor +/n;. would be left on the keyboard until the entire ith row is 
divided; or, if reciprocal multiplication is preferred, the multiplier 1/+/n;. would 
be left on. From this preparatory table, the cumulation of squares and crogs- 
products in the vertical gives the required summations for the coefficients. The 
sum check would be applied in the usual manner. 


3. A numerical example of the two dimensional Case II. The fact is that 
in practice one need not bother about forming and solving the normal equations 
because they will be displaced by a simplifying iterative procedure, to be ex- 
plained in a later section. For illustration, however, we may do an example 
both ways, first using the normal equations and the adjustment (19), later on 
accomplishing the same results by the quicker method. 

We may start with the unitalicized numbers in the 4 X 6 array of Table I, 
assuming these to be the sampling frequencies n;; to be adjusted. Actually, 
they were obtained by deflating 1/20th (for a supposed 5 per cent sample) the 
New England age X state table on p. 1108 of vol. 2 of the Fifteenth Census of 
the U. S., 1930, then varying the deflated values by chance with Tippett’s 
numbers to get our sampling frequencies n;;. The italicized entries in Table I 
represent the final (adjusted) m,;, and it is these that we now set out to get. 
We start off with the sampling frequencies n;; and the known marginal totals 
M1, M2, etc., where m;. = N;n/N, m.; = N.m/N, as in eqs. (6) and (7). 
The Lagrange multipliers shown along the left-hand and top borders arise in the 
calculations now to be undertaken. 

Table II is the preparatory table, advised at the close of the last section. It 
is derived from Table I by dividing the ith row of sample frequencies by +/7, . 
For example, the entry 8.64 in the cell i = 3, 7 = 2 comes by dividing 419 by 
+/ 2352, 419 being the entry in the cell of the same indices in Table I, and 2352 
being the sum of the third row. The sums at the bottom and right-hand side 
are for checking the formation of the normal equations. The cumulations of 
squares and cross-products along the vertical give the summations required for 
the normal eqs. (22), which now appear numerically as eqs. (23). 


Au A.2 A.3 = 1 
7413 —3549 -—2354 = 3197 x 10° 
4441 —544 = 2356 
3129 = 3222 
0 





A LEAST SQUARES ADJUSTMENT 


Performing the solution by any favorite procedure one will obtain 
(24) ‘ha = 01182 =r» = 01490 —- As = .00119 


TABLE I 
A table of artificial sample frequencies, an artificial 5 percent sample of native 
white persons of native white parentage attending school, by age by state, New 
England, 1980. The adjusted frequency m;; in each cell is shown italicized 
just below the corresponding sample frequency n;; 


Age 7 to 13 | 14&15)| 16 & 17/18 to 20 


1 


State 


Maine .0146)) 3623 313 5274 
3613 308 || 5252 


New Hampshire .0003|| 1570 155 2371 
1588 156 | 2396 


Vermont 0234|| 1553 116 || 2352 
2432 


Massachusetts 15859 


Rhode Island .0230)} 1681 2359 
1662 2330 


Connecticut 


5260 | 3493 | 2237 
m.; 22877 | 5285 | 3462 | 2213 


The adjusted m,; (italicized) are rounded off, hence when summed may occasionally 
disagree a unit or so with the expected marginal totals (also italicized), the latter arise 
by deflation from the universe rather than by direct addition of the mi; . 


whereupon by substitution into eq. (21) comes 
Mi. = —.0146 . = —.0162 
(25) Ae. = —.0003 . = —.0230 
As. = +.0234 . = —.0034. 


The next step is to compute the m;; by eq. (19). Table I is now bordered 
with the Lagrange multipliers for a convenient arrangement of the factors 
required, and the calculation is completed. It will be noted that, for example 





434 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


(26) ms: = 419(1 + .0234 + .0149) = 435. 


The m,; thus calculated are shown italicized in Table I. The marginal totals, 
found by adding the m,; just calculated, do not agree exactly everywhere with 
the expected totals, because of rounding off to integers: the errors of closure, 
however, are slight, and it is a simple matter to raise or lower some of the larger 
cells by a unit or two to force exact satisfaction of the conditions, if this is 
desired. 


4. The three dimensional problem. -Here the N cards of the universe are 
sorted and counted for one and perhaps a second and third characteristic, and 
possibly crossed by pairs in various combinations (Cases I-VII). The sample 
of n, however, is crossed by all three characteristics, which is to say that the 

TABLE II 


This comes by dividing each sample frequency in Table I by the corresponding ~/n,. 
(This operation would ordinarily be done a row at a time) 


; ms./+/ 1. Sum 


49 .89 : d ‘ 72.32 : 144.94 
32.24 ‘ ‘ . 49.19 97 .87 
32.02 ‘ ‘ ‘ 50.15 98 .64 
83 .68 : ‘ : 125.19 251.12 


34.61 ; ‘ : 47 .97 96.54 





51.77 ‘ ‘ . 75.51 150.49 


Sum 284.21 65.69 i ‘ 420 .33 839 .60 





cell frequencies ;;, are all known (refer to Fig. 2). As before, the adjusted 
frequencies are required. 

Case I: One set of slice totals known. Assume the slice totals N,.., N2.., 
.-- , N,.. to be known; the conditions are then 


(27) 2» mi = m;,. = Ni..n/N i=1,2,-.- 
7 


being rin number. The summation to be minimized is 
(28) S = Umi — Min) [Nii 
being similar to that in eq. (3), except that now there are three indices to be 


summed over instead of two. Following a procedure similar to that used before, 
we differentiate eqs. (27) and (28) and introduce the r Lagrange multipliers \. 





A LEAST SQUARES ADJUSTMENT 435 


with eq. (27). The steps are identical with those of the two dimensional Case I, 
and the result is at once 


(29) Mik = Nij(l + Aj..) = nepe(m../N..). 

This adjustment, like that shown by eq. (15), is a simple proportionate one, but 
this time by slices rather than by columns. All cell frequencies having the same 
i index are raised or lowered in the same proportion. 


1 

2 
3 
4 
5 

6 
a 
8 
9 
r 


Fic. 2. SHowING THE SysTEM OF NOTATION FOR THE CELL FREQUENCIES AND MARGINAL 
TOTALS IN THE THREE DIMENSIONAL SAMPLE 


Case II: Two sets of slice totals known. 


Here, in addition to the slice totals 
of Case I we know also 


Nia., N2.,---, Ne. 


whence arise the s — 1 additional conditions 


(30) 2X Miz = m;, = N.;.n/N, 





436 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


Using the Lagrange multiplier \.;. here, and \;.. with eq. (27) as before, we 
find that 


(31) Mijk = Nill + Ax.. + X.;.) 

in which X.,, is to be counted zero. This adjustment is proportionate by tubes, 

the ratio m;;./nij being constant along the 2jth tube and in fact equal to 

m;./ni;. , independent of k. Unfortunately we do not here know the face totals 

m,;, and are unable to make use of the proportionality as we shall in Case IV. 
To solve for the r + s — 1 Lagrange multipliers we sum the members of eq. 

(31) over j and then over 7 and arrive at the normal equations 


Mi. + De MGs. = M.. — Ni... F=1,2,-66,7, 
2 


(32) 
Re Ng... + NGG. = M5. — N53, J=1,2,-+-,8—1. 


These can be reduced to s — 1 equations in precisely the same way that eqs. 
(20) were reduced, but because of the iterative process to come further on, we 
shall not pursue the reduction here. 

Case III: All three sets of slice totals known. Aill slice totals 


Na.,Na2., +++ Na. 

a er N,.. 

N.a,N.2,-+- Nut 
now being known, in addition to conditions (27) and (30) we require here 
(33) do mi = m4 = N.un/N, k=1,2,---,t-1 


%2 
which makes a total of r + (s — 1) + (t — 1) orr +s +t — 2 conditions: 
The same kind of manipulation as used heretofore gives 


(34) Mitk = Neje(l + As. + AG. + AL) 


with ,,. and \.., to be counted zero. The adjustment is no longer propor- 
tionate by slices or tubes, but involves every cell. In practice, once the normal 
equations are solved and the Lagrange multipliers worked out, one proceeds 
very much as in the two dimensional Case II: for each of the ¢ slices, corre- 
sponding to the ¢ values of k, there will be a two dimensional adjustment, the 
1 in eq. (19) being replaced now by 1 + A.x. 

The normal equations for the Lagrange multipliers can be found by per- 
forming double summations on eq. (34). The result is 


ms..¥i.. + DE mg.d.5. + 2» Nitd..k = M..—N.., t= 1,2,--- 
P 


(35) i Ny... 1.5.0.5. + 2X Ni ikd..k = M7. — N3., ) = 1,2, --- 


De Mibdi.. HDS Mids. HMA = Mk — Nak, = 1, 2, --- 
s 7 





A LEAST SQUARES ADJUSTMENT 437 


If these calculations were to be carried out, one would simplify the computation 
by solving the top row for ),.. , getting 


(36) Mi. = (1/nx..) {m:.. — oe Nj.X53. — xX Nikd..%} — 1 


and then substituting this into the middle and last rows of eqs. (35) to get a 
reduced set of s + ¢ — 2 normal equations for the Lagrange multipliers X. ;. 
and \.., , the numerical values of which when set back into eq. (36) give the d,.. . 
In all the summations of eqs. (35) and (36), A... and A... would be counted zero. 
But here again, the iterative process to be explained later will displace the use 
of normal equations, so actually we are not interested in reducing them. 

Case IV: One set of face totals known. It may be that the rs face totals 


Nu. , Niz., «+> , Naz, +++ Nee. 


are known from crossing the 7 and j characters in the universe. The conditions 


are then 

= i, 2, cee, tT, 

(37) Do mine = mz. = Ny.n/N 
k 1, 2, ---,8. 


The adjustment here turns out to be 

(38) Mijk = Mize(1 + Az.); 

but by summing both sides over the index k to evaluate X,;. it is seen that 
(39) mij. = Niz.(1 + Aaj.), 

whence 

(40) Mi jk = Nije(Ms;./Nij.). 


This adjustment is thus proportionate by tubes, like that in eq. (31), though 
here the factor m;;./n;;. is known and eq. (40) can be applied at once. 

Case V: One set of face totals, and one set of slice totals known. Sometimes, in 
addition to the rs face totals of Case IV, the slice totals 


N.a,N.2,°-: NV. 


will also be known, in which circumstances the conditions (37) are to be accom- 
panied by 


(41) Zz mii = mM.x = N.un/N, 
7 


The same procedure as previously applied yields now 
(42) Mise = Nije(l + Az. + A.) 


with X.., to be counted zero. Summations performed over k, and then over t 
and j together, give the normal equations 





W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


Nsj. Aaj. + 2X Nijkd..k = Mj. — Nij. ; 
(43) 


Ee Nijk dij. FNAL = Mk — Nk 
ij 


The number of equations is rs + ¢ — 1, since \.., does not exist. As before, 
a simplification can be effected by solving the top row for \;;. and making a 
substitution into the lower one, but because of the great advantage of the 
iterative process to be seen further on, we shall not carry out the reduction. 

Before going on it might be noted that although this case is three dimensional, 
it reduces to the two dimensional Case II if one considers that 7j. is one index 
running through the values 11, 12, --- , 21, 22, --- , rs, and that . .k is a second 
index running through the values 1, 2, --- ,¢. This can be seen by the simi- 
larity between eqs. (43) and (20). 

Case VI: Two sets of face totals known. If in addition to the face totals of 
Case IV, the face totals 


Nu, Niaz, -:: » N.st 


are also known from further crossing the j and k characters in the universe, we 
shall require 


j = 1, 2, -++,8, 
k=1,2,++-,t-1 


in addition to the conditions (37). In place of eq. (40) of Case IV we now 
find that 


(45) Miike = Mige(l + Aaj. + A. ix) 


in which X. ;; is to be counted zero for all 7. No simple relation such as eq. (40) 
is possible here, because the adjustment is not proportionate by tubes; the 
Lagrange multipliers must be evaluated. This can be accomplished by summing 
the members of eq. (45) over k and 7 in turn, resulting in the normal equations 


(44) i Mik = Mj = N.xn/N, 


Ni. Aaj. + » Nijk dik = Mi. — Nij., 
(46) 
z Nik aj. HMA = Ma jk — Nik. 


Since .;, does not exist for any values of j, the number of equations is 
rs + s(t — 1) = s(r + ¢ — 1). They break up at once into s sets each of 
r + t — 1 equations, one set for every j value. In fact, the problem can be 
considered as s sets of the two dimensional Case II. Any one value of 7 gives 
a slice, which can be looked upon as fulfilling the specifications of the two 
dimensional Case II. Each set of normal equations can be reduced in the same 
manner that eqs. (20) were reduced. 

Case VII: All three sets of face totals known. All totals now being known, 
we require 





A LEAST SQUARES ADJUSTMENT 


(37) 2X Mik = Mi; = Ni.n/N, 


(44) y Nik = MK = N x4n/N, 


(47)  » Mik = Mx = Nixn/N, 
7 


The adjusting relation is 
(48) Mijk = Nige(l + Agj. + AG A+ Ave) 


in which 2. ; is to be counted zero for any j, A,.« for any k, and ),.; for any 7. 
The normal equations for the Lagrange multipliers are 


Ni; dj. + 2 Nizkd.jk + 2» Nijkdi.k = Mi. — Nj. 
(49) & Nijk daz. H+ NjkAje + p NijkNik = M. jk — N.jk 
s s 
z Nie dg. + : Nijer.jk + Nikdi.k = Mik — Nik 
j j 


being rs + rt + st — r — s —¢+1in number. They can be reduced in the 
same way that previous normal equations have been reduced; but here again, 
the iterative process will render the use of normal equations unnecessary, except 
for theoretical purposes, e.g. justification of the iterative process. 


5. A simplified procedure—iterative proportions. It is well known in least 
squares that the number of Lagrange multipliers in any problem is equal to the 
number of conditions imposed on the adjustment. Here the conditions have 
appeared in sets, depending on which marginal totals are involved. By a com- 
parison of eqs. (15) and (29) on the one hand, with eqs. (19), (31), (34), (42), 
(45), and (48) on the other, we see that wherever there was only one set of 
marginal totals involved we came out with a proportionate adjustment, but 
that in all other cases it was not so; the Lagrange multipliers involved were 
unfortunately related to one another through normal equations. We now make 
the observation, however, that as a first approximation the adjustments may 
all be considered proportionate, and we shall be able to write down an expression 
for the error in this approximation, and shall be able to eliminate it by a suc- 
cession of proportionate adjustments. 

Take the two dimensional Case II for an example. In eq. (21) one may 
recognize (1/n;.) i n;j\.; a8 a weighted average of \.; for the ith row. There 


2 
will be a weighted average of \.; for the first row, another for the second, etc., 
one for each value of 7; consequently one may appropriately speak of the ith 





440 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


average of \.;, writing it 7-av.\.;. Substituting from eq. (21) into (19) one 
then sees the adjustment (19) appear as 


(50) mis = Nij(m./ns. + A.; — tav. X.;). 

If, on the other hand, \.; had been eliminated from eqs. (20), instead of ,. , 
the result would have been 

(51) m5 = ni;(m, ;/N.; + A. jrav. As.). 


From either eq. (50) or (51) it is clear why the adjustment (19) is not propor- 
tionate by rows or columns, and why Case II does not break up into r or s sets 
of Case I: the reason is that .; in any cell is not necessarily equal to the average 
\.; for that row, nor is );, in any cell necessarily equal to the average },. for 
that column. If nevertheless one were to make the simple proportionate 
adjustment 


(52) mi; = nij(m;./n;.) 


along the horizontal in the ith row, the horizontal conditions (4) will be en- 
forced but not the vertical ones (5); i.e., it will be found that m;. = m,. , but 
that usually not all m'; = m.;. This is because eq. (52) effects only a partial 
adjustment, each m;; being in error through the disparity between the. ; proper 
to the jth column, and the average of all the \.; for the 7th row, as seen in 
eq. (50). This error can then be diminished by turning the process around and 
subjecting these m;; to a proportionate adjustment in the vertical according to 


the equation 
(53) mis = mi ;(m.;/m3) 


which may be considered an application of eq. (51) wherein the disparity be- 
tween any ),;. and the average \;. for the jth column has been neglected. It is 
the vertical conditions that will now be found satisfied, but perhaps not all of 
the horizontal ones, because some of the row totals may have been disturbed. 
The cycle initiated by eq. (52) is therefore repeated, and the process is con- 
tinued until the table reproduces itself and becomes rigid with the satisfaction 
of all the conditions, both horizontal and vertical. The final results coincide 
with the least squares solution, which is thus accomplished without the use of 
normal equations. 

Usually two cycles suffice. In practice the work proceeds rapidly, requiring 
only about one-seventh as much time as setting up the normal equations and 
solving them. The tables III-V show the various stages of the work when 
the method of iterative proportions is applied to the sample frequencies of 
TableI. It will be noticed that the results of the third approximation (Table V) 
are final, since if the process were continued, the table would only reproduce 
itself. 

The same process can be extended to three or more dimensions with an even 
greater relative saving in time. To see how the method of iterative proportions 








Se ~~ “a = ° 


QQ ma KK Fo cor Ff 


A LEAST SQUARES ADJUSTMENT 441 


applies in one of the three dimensional cases, we may go back to Case III. By 
the substitution afforded through eq. (36) the adjusting eq. (34) may be put 
into the form ' 


TABLE III 
The method of iterative proportions applied to the data of Table I. First stage: 
A proportionate adjustment by rows by eq. (52). Note that m;, = m., 
but that m'; ¥ m.; 








j=l 2 3 4 m’ mi 
i=1 3608 778 555 312 5253 5252 
2 1586 399 254 157 2396 2395 
3 1606 433 273 120 2432 2432 
4 10476 2441 1696 1153 15766 | 15766 
5 1660 349 169 152 2330 2330 
6 3910 863 548 341 5662 5662 
| Oo —____—__SSS_|«— US Saq_4bh_ ___S=_LC_ WT OX—X— SS] —_—_—_—_—_—_]———————_—_— 
m3 22846 5263 3495 2235 33839 
m. j 22877 5285 3462 2213 33837 
TABLE IV 


A continuation of the process initiated in Table III. The figures in Table III 
are now adjusted proportionately by columns according to eq. (58). The vertical 
totals m'; and m.; now are equal, but the agreement of the horizontal totals 
accomplished in Table III has been slightly disturbed 








j=1 2 3 4 mi. ms, 

i=1 3613 781 550 309 5253 5252 

2 1588 401 252 155 2396 2395 

3 1608 435 270 119 2432 2432 

4 10490 2451 - 1680 1142 15763 15766 

5 1662 350 167 151 2330 2330 

6 3915 867 543 338 5663 5662 
mj 22876 5285 3462 2214 33837 


mM, j 22877 5285 3462 2213 33837 


(54) Miike = Nije(Ms../Ns.. + AG. + ALR — AV. Aj. — Hav. Ae). 
Equally well it could have been written 


(55) Mise = Nize(m.;./N.7. + As. + ALk — FAV. As.. — Jeav. Aux), 
or 





442 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


(56) Miike = Nsze(™M,.4/N..% + As:. + AG. — heav. A.. — Kav. X.;.). 


Any of these three equations shows why the adjustment (34) is not propor- 
tional by slices, and why this case does not break up into r or s or ¢ sets of the 
three dimensional Case I, As a first approximation it does, as is now clear 
from these three equations, and by making successive proportionate adjust- 
ments we may thus arrive at the least squares values. To go about the work 
we could first calculate the values of 


(57) Mik = nize(m../Ni..) 
then 
(58) Mije = Mi jx(m.;./M;.) 


TABLE V 


The cycle is commenced again. The figures of Table IV are subjected to a propor- 
tionate adjustment by rows, according to eq. (52). And since these results turn 
out to be almost a reproduction of Table IV but with both horizontal and vertical 
conditions satisfied, they are considered final. The agreement with the m;; in 
Table I should be noted 


mi. 
5252 
2395 
2432 
15765 
2330 


1 
2 
3 
4 
5 
6 


, 
Mj; 
™, ; 


followed by 
(59) Mise = Mije(m. 4/mM4). 


These three successive adjustments would constitute a cycle, which would then 
be repeated in whole or in part until the table becomes rigid with the satis- 
faction of all three sets of conditions. 


6. Simplification when only one cell requires adjustment. On occasions it 
happens in sampling work that one is especially interested in one particular cell 
of the universe, and would like to have a result for it in advance before the other 
cells are adjusted, Sometimes it even happens that the others individually 
are of no particular concern. In such circumstances one merely places the cell 





A LEAST SQUARES ADJUSTMENT 443 


of interest in one corner of the table by an appropriate interchange of rows and 
columns, and then compresses the rest of the table into the cells adjacent to it. 
In the two dimensional Case II one would thus work with a 2 X 2 table, one 
corner cell being the one of special interest, the other three being the result of 
compression, The marginal totals of the row and column belonging to the cell 
of interest are unaffected. For illustration we may suppose that from the 
sample shown in Table I we require only mm. We then start with the 2 « 2 
Table VI, which is derived from Table I by compression. Commencing with 
Table VI, one might first adjust by rows according to eq. (52), then by columns 
by eq. (53). One cycle of iterative proportions is sufficient, as is seen in Table 


TABLE VI 
Derived from Table I by compression, the cell i = 6,3 = 1, requiring adjustment 


j=u2-4 ni. mi. 


28215 28175 


TABLE VII 
A proportionate adjustment of Table VI 


Rows adjusted by eq. (52) Columns adjusted by eq. (53) 
18938 9237 28175 18962 9213 28175 
3910 1752 5662 3915 1747 5662 


22848 10989 33837 22877 10960 33837 
Conclusion: me, = 3915 


VII, and the value 3915 found for me, is in good agreement with its value shown 
in Tables I and V. The scheme of compression provides a quick method of 
getting out an advance adjustment for a cell of special interest, and the result 
so obtained will ordinarily be in good agreement with what comes later when 
and if all the cells are adjusted. 

In the three dimensional Cases II, III, V, VI, and VII, one compresses the 
original table to a 2 X 2 X 2 table, and then uses the method of iterative propor- 
tions. (The other cases do not require consideration, since they are propor- 
tionate adjustments wherein one is already at liberty to adjust as few or as 
many cells as he likes without altering the equations or the routine.) The same 
procedure can be extended to the adjustment of two cells, the only modification 





444 W. EDWARDS DEMING AND FREDERICK F. STEPHAN 


being that in two dimensions we shall compress to a 2 X 3 ora 3 X 3 table, 
depending on whether the two cells do or do not lie in the same row or column. 
In three dimensions we compress toa 2 X 2 X 3,ora2 X 3 X3,ora3 X 3 X 3 
table; the first if the two cells lie in the same 7, j, or k tube, the second if they 
lie in the same slice but not in the same tube, the third if they are in separate 
slices. 


7. Some remarks on the accuracy of an adjustment. A least squares adjust- 
ment of sampling results must be regarded as a systematic procedure for 
obtaining satisfaction of the conditions imposed, and at the same time effecting 
an improvement of the data in the sense of obtaining results of smaller variance 
than the sample itself, under ideal conditions of sampling from a stable universe. 
It must not be supposed that any or all of the adjusted m,; in any table are 
necessarily “‘closer to the truth” than the corresponding sampling frequencies 
ns; , even under ideal conditions. As for the standard errors of the adjusted 
results, they can easily be estimated for the ideal case by making use of the 
calculated chi-square. For predictive purposes, however (which can be regarded 
as the only possible use of a census by any method, sample or complete), it is 
far preferable, in fact necessary, to get some idea of the errors of sampling by 
actual trial, such as by a comparison of the sampling results with the universe, 
as can often be arranged by means of controls. There is another aspect to the 
problem of error—even a 100 per cent count, even though strictly accurate, is 
not by itself useful for prediction, except so far as we can assert on other grounds 
what secular changes are taking place. 


In conclusion it is a pleasure to record our appreciation of the assistance of 
Miss Irma D. Friedman and Mr. Wilson H. Grabill for putting the formulas 
and procedure into actual operation with census data, and thereby disclosing 
defects in earlier drafts of the manuscript. 


BUREAU OF THE CENSUS, 
WASHINGTON 





NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


eR nn 


THE STANDARD ERRORS OF THE GEOMETRIC AND HARMONIC 
MEANS AND THEIR APPLICATION TO INDEX NUMBERS’ 


By Nizan Norris 


Attempts to derive useful expressions for estimating the standard deviations 
of the sampling errors of the geometric and harmonic means have not yielded 
results comparable with those afforded by the modern theory of estimation, 
including fiducial inference. There are in the literature of probability theory 
certain theorems which can be applied to obtain these desired results in a 
straightforward manner. The use of forms for estimating standard errors is 
subject to certain conditions which are not always fulfilled, particularly in the 
case of time series. An understanding of these limitations should deter those 
who may be tempted to judge the significance of phenomena such as price 
changes solely on the basis of estimated standard errors of indexes. 


1. Statement of formulas. The standard error of the geometric mean of a 
sequence of positive independent chance variables denoted by 2; = 2%1,2%2,--:, 
In, IS og = Wa , where 0; is the population geometric mean of the variates; 

n 


so that og z is the standard deviation of the logarithms in the population as 
given by otg z = [E{[log x — E(log z)]*}]*; and n is the number of individuals 
comprising the sample. The estimate of the standard error of the geometric 


nr — 
estimate of 0, ; so that Sig «, is the estimate of cig 2 ; and n — 1 is the degree of 
freedom of the sample. 


‘ Slog z; ‘ : . 
mean is 83 = G aor where G is the sample geometric mean, that is, the 


1 This article summarizes two papers presented at sessions of the Institute of Mathe- 
matical Statistics at Detroit, Michigan on December 27, 1938, and at Philadelphia, Penn- 
sylvania on December 27, 1939. The results given herein can be derived by several meth- 
ods, which vary somewhat as to degree of rigor. The writer wishes to acknowledge his 
indebtedness to the referee for suggesting a proof based on a probability theorem stated 
by J. L. Doob, ‘‘The limiting distributions of certain statistics,’’ Annals of Math. Stat., 
Vol. 4 (1935), pp. 160-169. The standard deviation formulas obtained follow as an applica- 
tion of this theorem, as will be seen by reference to it. Obviously the asymptotic variance 


formulas of many other statistics (estimates of parameters) can be obtained in a similar 
manner. 


445 


































446 NILAN NORRIS 


The standard error of the harmonic mean of a sequence of positive inde- 


° ° oC 
pendent chance variables denoted by 2; = 21, 22,---,2n, iS og = 6; —@ 





? 


n 
where the population harmonic mean of the variates is @. = 1/a = [E(1/z)]"’; 
so that the standard deviation of 1/z in the population is oy. = [E{[1/z — 
E(1/z)}?}}!; and n is the number of observations comprising the sample. The 





estimate of the standard error of the harmonic mean is sy = 4 ptm , where 
avVv/n— 


the estimate of a is given by a = z = = (2 1/z;);in which s 1.2; is the standard 


deviation of the reciprocals of the observations comprising the sample; and 
n — 1 is the degree of freedom of the sample. 


2. Derivation of formulas. These forms can be obtained by application of 
the Laplace-Liapounoff theorem’ as follows: Let 2; = 21, 22, --- , Zn be a set of 
positive independent chance variables with the same distribution functions, 
where the expectations, E(z,) and E(x?) exist, and where o; = E{[z; — E(z;)]*} 
> 0. The last condition is imposed to eliminate the trivial case in which the 2; 
are all equal and their distribution is confined to a single point. The geometric 
mean of the 2; is G = (2-22. --- -2,)'/", and the harmonic mean of the 7; is 


w= [,22) 


It is necessary to assume that both cicg z and oj/z are finite, and that in the 
case of both log z and 1/z at least one moment of order higher than any two of the 
respective variates is also finite. The requirement that the variance and at 
least one moment higher than the variance be finite can be weakened in various 
ways, but this is a trivial consideration, since nearly all distributions of any 
importance have finite third moments.’ Certain rarely occurring types of 
distributions, such as the Cauchy distribution, have infinite variance. In such 
cases, standard error formulas as ordinarily used are not valid. 

Let E(log x) = ¢, and E(1/x) = a. By the Laplace-Liapounoff theorem, 


Vn(log G — $) 


Clogz 


except for terms of order 1/+/n, the limiting distributions of 
/n(H”* — a) 


Cijz 
That is, if C represents a set of conditions on chance variables, and P{C} is the 
probability that these conditions are satisfied, then 


and are normal with zero arithmetic means and unit variances. 


2A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Ergebnisse 
der Mathematik und ihrer Grenzgebiete, J. Springer, Berlin, 1933, Vol. II, No. 4, pp. 1-8; 
J. L. Doob, op. cit., pp. 160-169; and S. S. Wilks, Statistical Inference, 1936-1937, Edwards 
Brothers, Inc., Ann Arbor, 1937, pp. 39 f. 
3 For a more detailed discussion of this matter see Wilks, op. cit., pp. 39 f. 





GEOMETRIC AND HARMONIC MEANS 447 


/n(log G — £) \ . {var =) \ 1 ' & 
lim P{Y #0oe 6 — P) <t>= lim P — “Le 2 dz. 


In order to use these relations in obtaining the limiting distributions of the 
geometric and harmonic means, it is necessary to suppose that the sequence of 
random chance variables, V;, converges in probability (converges stochasti- 
cally) to p, and that the sequence of random chance variables, +~/n(V; — p), has 
a normal limiting distribution with zero arithmetic mean and variance o’. 
Also, it is necessary to assume that the real-valued function, f(z), has a Taylor 
expansion valid in the neighborhood of p. If f’(p) ¥ 0, only the first two terms 
of the series are needed. The required expansion is given by 


n->o 


2 
f(a) = f(a) + (@ — p)f'(e) + Z = s'p + Ble — oD) 
where 0 < 8 < 1, and f’’(z) is continuous in the neighborhood of p. When these 
conditions are fulfilled, the limiting distribution of »/n[f(V;) — f(p)] is normal 
with an arithmetic mean of zero and a variance of o [f’(p)]’. 

Let f(log G) = e'°* %, and use the expansion given by e' ° = ef + (log G — pe 
+ 3(log G — ¢)? tPF Since 6, = e', it follows that the limiting distribu- 
tion of ~/n(G — 6;) is normal with an arithmetic mean of zero and a variance of 
60 log z- 

Similarly, it can be shown that the limiting distribution of »/n(H — 62) is 
normal with an arithmetic mean of zero and a variance of 6301/2, where 62. = 


* = (E(/2))". 


It is of some interest to observe that the expressions for the standard errors 
of the geometric and harmonic means correspond with the forms previously 


given for the standard errors of two efficient ratio-measures of relative variation,‘ 
namely, 
Cola = 01 Cajeo, and oxy = 63 o 
G2 ’ /@ 6 G/H » 
where @,/6 is the population geometric-arithmetic ratio, and 62/6; is the popula- 
tion harmonic-geometric ratio. 


3. Limitations of standard-error estimates. Application of these forms is 
subject to the usual conditions for drawing sound inferences on the basis of the 
representative method. Fiducial argument should be employed to avoid certain 
untenable assumptions of the outmoded method of using standard errors. 
Estimates of the standard deviations of sampling errors do not constitute an 
ultimate test of significance which can be applied with a high degree of success 
to all types of problems. In general, such estimates cannot be relied upon with a 


‘ Nilan Norris, ‘‘Some efficient measures of relative dispersion,’’ Annals of Math. Stut., 
Vol. 9 (1938), pp. 214-220. 





448 A. W. BROWN 


high degree of confidence when they are used as tests of significance for index 
numbers, since in nearly all time series there exists an appreciable degree of 


serial correlation, persistence, or lack of independence among successive items of 
any sample. 


4. Bibliographical note. Certain aspects of the sampling distribution of the 
geometric mean have been discussed by Burton H. Camp.’ Attempts to derive 
forms for estimating the standard errors of index numbers have been made by 
Truman L. Kelley’ and Irving Fisher,’ and an empirical study of the sampling 
fluctuations of indexes has been made by E. C. Rhodes.* Although various 
special tests of significance for time series have been proposed,’ at the present 
time no generally satisfactory procedure has appeared. 


Hunter CoLieceE, 
New York, N. Y. 


*‘ Burton H. Camp, ‘‘Notes on the distribution of the geometric mean,’’ Annals of Math. 
Stat., Vol. 9 (1938), pp. 221-226. 


* Truman L. Kelley, ‘‘Certain Properties of Index Numbers,’’ Quarterly Publications of 
Am. Stat. Assn., Vol. 17, New Series 135, Sept., 1921, pp. 826-841. 

7 Irving Fisher, The Making of Index Numbers, Houghton Mifflin Company, New York, 
1927, 3d ed., pp. 225-229, 342-345, and Appendix I, pp. 407 and 430 f. 

* E. C. Rhodes, ‘‘The precision of index numbers,’’ Roy. Stat. Soc. Jour., Vol. 99 (1936), 
Part I, pp. 142-146, and Part II, pp. 367-369. 

® Some of the more recent papers dealing with this matter are: G. Tintner, ‘‘On tests of 
significance in time series,’’ Annals of Math. Stat., Vol. 10 (1939), pp. 139-143; ‘‘The analysis 
of economic time series,’’ Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 93-100; L. R. Hafstad, 
“On the Bartels technique for time-series analysis, and its relation to the analysis of 
variance,’’ Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 347-361; and Lila F. Knudsen, ‘‘Inter- 
dependence in a series,’’ Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 507-514. 


A NOTE ON THE USE OF A PEARSON TYPE III FUNCTION IN 
RENEWAL THEORY 


By A. W. Brown 


One of the methods suggested by A. J. Lotka’ for the derivation of the renewal 
function may be briefly summarized as follows. 

The method consists of dissecting the total renewal function into “genera- 
tions”. The original installation constitutes the zero generation, the units 
introduced to replace disused units of the zero generation constitute the first 
generation, renewal of these the second, and soon. Let f(x) be the “mortality” 
function, the same for all generations. f(x) is a function satisfying the usual 
conditions of a distribution function. Adopting Lotka’s notation, let N be the 
number of units in the original collection, B,(t) dt the number of objects intro- 


1 A, J. Lotka, ‘‘A Contribution to the Theory of Self Renewing Aggregates, With Special 
Reference to Industrial Replacement,’’ Annals of Math. Stat., Vol. 10 (1939), p. 1. 





RENEWAL THEORY 449 


duced between times ¢ and ¢ + dt and belonging to the first generation, B:(t) dt 
a similar expression for the second generation, etc. B,(t)/N, B2(t)/N,...may 
be regarded as renewal density functions for the various generations. 

Now, evidently, 


(1) Bit) = Nf() 
(2) B,(t) = I By(t — 2)f(x) dz 


and in general 
(3) Baal) = [ Bit — =f) ae. 
0 


Summation of the contributions of the successive generations gives for the total 
renewal at the time ¢ 


(4) BO) = Bi) + "Bit — a)f(e) de. 


In this note we propose to use a Pearson Type III function for f(z) and observe 
what form our equations then assume. The Pearson Type III function 
a a*e-* (c > 0, k > 0), appears to be a reasonable one to use in many 
practical situations. The two parameters c and k give it a considerable amount 
of flexibility. The fact that this function has an unlimited range in one direc- 
tion is relatively unimportant from a practical point of view, as is well known 
from the experience of fitting curves of this type to skewed data with limited 
range. Of course the question of whether a Type III curve is appropriate can 
be answered more objectively by using the usual Pearson curve-fitting criteria, 
Bi, Be and k. We have, then, substituting in (1) 


(5) , Bi) =<N 2 fe* 
oo 


and from (2) 


B ‘Ny c k—-1 —c(t—z) c* k—-1 et de 
= | ap te a te 


Nc* 


(7) ~ T(r) 


t 
¢* | (¢ — x)*2*" dz. 
0 
If, now, we set z = ty, the integral in (7) reduces to 


, k—-1_b-1 -1 P(k)T(k) 
I (¢—2) 2 de = Tak) 










450 A. W. BROWN 
Hence, 


ae a 2k—1 —ct 
(8) B,(t) = N Tk | é 






and in general 


B; i) = jk—1 g =. 
(9) (0 = ae te 
Summing the contributions of the several generations, we have for the total 
renewal function 


_-_ nie -6|. a 
(10) B() = Nee Gr + Ge + eo} 


If k is a positive integer > 3, (10) can be easily summed to a form which 
shows immediately its damped periodic nature. Even if k is positive but not 
an integer, it can be shown by continuity considerations that the function B(t) 
defined by (10) has periodic properties. 

Assuming k to be a positive integer, then, and setting z = ct, we may write 
the expression in brackets in (10) as 

















k—1 2k—1 


zZ 4 
(11) (k—1)! + (Qk—-pit = f(2). 


















Then 


712) _ 4 


and upon making the trial substitution, f(z) = Ae”’, we get 


k 
Am‘e™ = Ae™. 





Hence, 


Taking unity in its complex form 
1 = cos 2n7 + 7 sin 2nr 


we have that 


(12) mn, = </] = cos 77 +. sin 2 


where n = 0,1, 2,---,4 —1. Then 


fj = > 4. e™ 









k—1 
f@) =D Aamie™. 








RENEWAL THEORY 


Now setting z = 0, we get 
f(0) = Ao + Ai+--- + Ara =0 
f'(0) = Aomo + Arm, + --- + Azam. = 0 


f°") = Aoms* + Aymk? +... + Ayam} = 1 


k equations to determine the k constants. We know that A, is equal to the 
ratio of two determinants formed from the coefficients of the above equations. 


This ratio reduces to 

( i | ata 
13 i lt: iaaenersiaiaieiniaiats a ammemasininnniiiaiibiiaiti 
(m1 yee Mn) (Mp-2 - Mn) eee (mn ~_ mo) 


We have, then, an expression for the k constants in terms of the k roots of unity. 
Therefore, for any particular value of k we can obtain the sum of our series 
from the relation 


k—1 
fl) = > Ane™. 
n=0 
Hence, under the assumption that k is a positive integer, we have 
k-1 
(14) B(t) = Nee“ >> Ane™*. 
n=0 


The forms of Bit) for k = 1, 2, 3, 4 are respectively 
B(t) = Ne 
Bit) = 4Nc(1 — &**) 


Bit) = Nee“ 4c" —¢* (3 cos 3+/3ct + = sin +v/3ct) | 


Bit) = Nee “*[3(e* — &“) — 3 sin cl]. 


Although the above procedure is valuable particularly because it brings to 
light something of the nature of our renewal function, the forms derived above 
can be used actually to obtain values of B(t) for various values of t. However, 
for extensive numerical work a better method is at hand, which does not even 
depend on the assumption of an integral value for k. 

Let us return once again to equation (10) which may be written in the fol- 
lowing form 


et ( ct)* 
T(k) 


+ ect) a 7 








452 A. W. BROWN 


If k and c are determined by the method of moments, (using two moments), 
k will not, in general, be a positive integer. However, by using the Tables of 
the Incomplete Gamma Function edited by Karl Pearson, one can compute values 
of B(t) without much difficulty. In these tables the function I(u, p) is tabulated 
for various values of u and p, where J(u, p) is defined by 


ev’ dv 


i? pt+1 
0 


If we let & = w/p + 1 = wv/p then upon integrating by parts we find 


e ¢? 


” T+) 


= I(w,p — 1) —I(m, p). 


The left hand member of this equation is of the same form as each of the terms 
of the series in brackets in (15). Hence, the value of the renewal function for a 
particular time, ¢, is directly obtainable by summation of the right hand member 
of (17) for successive significant values of the argument p. 

By way of illustration a numerical example will be considered. The data are 
taken from E. B. Kurtz’ book entitled Life Expectancy of Physical Property. 
In this book the author makes a study of retirement rates of fifty-two different 
types of physical property, and finds that their replacement curves fall into seven 
distinct groups. We consider here Group VII which happens to be the largest 
group, embracing seventeen different types of industrial equipment out of the 
fifty-two examined. Using Kurtz’ replacement data * we obtain for the value 
of the first and second moments 


1 = 10.002 
ve = 121.71 
and from these by the method of moments, we find 
k = 4.62 
c = .462. 
We then proceed to calculate values of B(t)/N by means of Pearson’s Tables,” ob- 


taining the results shown in the following table. 


* E. B. Kurtz, Life Expectancy of Physical Property, Ronald Press, 1930, Table 22, page 86. 

? With regard to the method of interpolation employed in the calculations, it should 
be mentioned that it was found advisable to use the Mid-panel Central Difference Formula 
(zziii) on page ziti of the introduction to Pearson’s Tables; and that it is quite sufficient 
for our purposes to calculate only first order terms. 





ESTIMATES OF PARAMETERS 


~ 


B(t)/N t B(t)/N 


.0000 10 . 1049 
.0016 11 1043 
.0103 12 . 1028 
.0279 13 . 1006 
.0486 14 .0990 
.0714 15 .0994 
.0867 16 . 1009 
.0980 17 .1013 
. 1039 18 -0992 
. 1066 19 .0999 

20 .0993 


Connor WN KH © 


In conclusion the author wishes to thank Professor S. S. Wilks for various 
suggestions he has made in connection with this note. 


PRINCETON UNIVERSITY, 
Princeton, N. J. 


ESTIMATES OF PARAMETERS BY MEANS OF LEAST SQUARES 


By Evan JOHNSON, JR. 


As a criterion for comparing estimates of a parameter of a universe, of known 
type of distribution, the use of the principle of least squares is suggested. A 
criterion may be stated in rather general terms. Its application to any given 
problem presumes a knowledge of the distribution functions of the estimates 
considered. In the present paper a criterion is set up and application of it is 
made in the estimation of the mean and of the square of standard deviation of a 
normal universe. 

We shall use the symbol 6 to represent a parameter to be estimated. It is 
to be remembered that 6 is a constant throughout any problem, that it represents 
an unknown value, and that observations and functions of observations (called 
estimates) are the only variables that occur. We shall use the symbols 2; , 7 = 
1, 2, --- , n, to represent observed values of the variable x of the universe, and 
the symbol F to represent a given function of the observations 2; . 

If we choose to consider a given function F as an estimate of @, we are then 
interested in the error F — 6. This quantity differs from the so-called residual 
of least square theory, since we are here interested in the difference between 
computed and true values, rather than in the difference between observed and 
computed values. To avoid any possible confusion we shall refer to F — 6 
as the error. Over the set of all samples of n observations, z; , the distribution 
of the errors F — 6 is expressed by means of the distribution function f(F), 











454 EVAN JOHNSON, JR. 


which may be computed from the known distribution function of the universe. 
We shall assume thatthe function f(F) has been normalized, so that I] ‘ (F) dF = 
1, where the interval from @ to 8 includes all possible values of F. The integral 
I= [ ° (F — 06)*f(F) dF, associated with a given estimate F, may be thought 


of as the average square error over the set of all samples. 

In this notation we shall state a criterion for the judgment of estimates in 
either of the two following forms: 

DEFINITION 1. Let f; be the distribution function of F,, and fe that of F,. 
The estimate F, of @ will be judged better than the estimate F> if 


B 8 
[ @-o'pn@) de < | @ - Ole) ae. 


DEFINITION 2. From a given class of functions, of which F is a member, F will 
be called the best estimate if 


B 
(1) fe [ (F —6)°f(F) dF 


is less than the corresponding integral for all other functions of the class. 

It is to be observed that the integral J is a function of the quantities 6 and f. 
From this is seen at once the distinction between the present problem of mini- 
mizing the average square error and the similar problem of finding that point 
around which the mean square value of the deviations of a variable is a minimum. 
In the problem under consideration we wish to find the function F, or more 
precisely its distribution function f(F), for which J takes its minimum with a 
fixed value of 6. In the alternative problem we have a given distribution f 
and we wish to find the minimum of J with respect to @. 

A second observation to be made is that the integral J can not be usefully 
minimized in the sense of the general conditions of the calculus of variations. 
The problem would be of the isoperimetric variety, with the side condition 


6 
/ f(x) dz =1. Asolution might be expressed as the limit, as a approaches zero, 


of functions f(z) with proper continuity conditions, such that 


(= 0 when | x — 0| =a, 


f(@) 6+a 
> 0 when | z — @| <a, and ; f(z) dx = 1. 


Such a solution would be meaningless in practical statistical theory. Solutions 
are to be expected, therefore, only ini those cases where the class of. functions, 
from which F is to be selected, is sufficiently restricted. 

The two following examples illustrate both restrictions and possible applica- 
tion of the theory. 











——— es eS RS i EEL ON EC i 





ESTIMATES OF PARAMETERS 455 


As a first example let us consider the problem of finding an estimate F of the 
mean, Z, of a normal universe. The mean of a distribution is a symmetric 
linear function of the variates of the distribution. For the class of functions 
from which to select an estimate F of Z, let us take the class of all symmetric 
homogeneous linear functions of the observations z;. Let 


(2) F=a(a+ae+--- +2n). 


We wish to find the value of a, if any, for which J is a minimum. 
F is the sum of n normally distributed independent variables, az; , each with 
standard deviation ac. F, therefore, has a distribution function 


f = C-exp (-“S="). 


2a? no? 


where C is so chosen that [ fdF =1. A discussion of general distribution func- 


tions may be found in Dunham Jackson’s article, ““Theory of Small Samples,” 
in the American Mathematical Monthly, Volume XLII, 1935. In this case it 
can be shown without particular difficulty that 


_ 7 = —(F — anzy 
I=z=C fo (F — Z)°- exp (=F) ar 


= ano’ + Z#'(an — 1)’. 
To determine the minimum of J with respect to a, we set 


al 


— = 2ano’ + 2z'(an — 1)n = 0, 
da 
and obtain 
— 1 
Se ee ee 
ni +o N1i+o /nz 
(3) 


2 
-1(1-S+--) 
n nz 


It is seen that for even such a simple example as the estimation of the mean 
there is no estimate of the form of equation (2), with a independent of the param- 
eter to be estimated, for which J takes its minimum value. 

For a distribution in which % ¥ 0, and o’/n’ is small, a is given as a first 
approximation by1/n. The function F is merely the mean of the sample obser- 
vations. If = 0, the required solution is a = 0, and there is no best least 
square estimate of the type of equation (2). 

In the case where o’/Z’ is not small, as is apt to be the case when Z is near 
zero, the determination of a desirable estimate by least squares requires a knowl- 
edge of the ratio o’/z’, which may perhaps be judged approximately in a special 





























456 EVAN JOHNSON, JR. 


problem. If this value is assumed known, the required value of a may be found 
most easily by rewriting equation (3) in the form 
1 
4 = ——___., 
(4) Ont e/# 

The second example to be considered is the determination of an estimate of 
o of a normal universe. A comparison with the definition of o° suggests the 
use of a function F given by the equation 
(5) F=a{ (a — 4) + (t% — #)' + --- + (tn. — 2)"}, 
where Z is the mean of the n observations. The value of a is, of course, to be 
determined by minimizing the integral J. 

F is the sum of the squares of m normally distributed but not independent 
variables. It may be shown, however, (Jackson, loc. cit.) to be expressible as 
the sum of the squares of n—1 independent normally distributed variables, each 
with standard deviation ~/ac. The distribution function for F takes the form 
(6) f(F) = C (Fy Ne rhe, 


F taking only positive values, and C is again chosen to normalize f(F). The 
integral J may be written 


iat I (F — o°)*(F) 9 ¢Fi00? gp 
0 


The integration is most easily accomplished by replacing F by u’, and in terms 
of u 


I= c | (ua? — o Pur te du, 
0 


The various steps in the integration will differ for even and odd values of n, 
but in each case the final result is the same. It is found that 


(7) I = o* { a(n? — 1) — 2a(n —1) +1}. 
The value of a which minimizes J is determined from the relation 
ol 


+ ‘{2a(n? — 1) — 2(n — 1)} = 0. 


0a 
Dividing by (n—1), which is not zero in a sample of two or more observations, 
we obtain 
(8) a= : 





n+1° 
In contrast to the previous example we have here an absolute minimum of I 


with respect to all estimates of the type of equation (5). The best least square 
estimate of this type is, therefore, 


_ (a — &)* + (2 — %)* +--+. + (an — @)” 
(9) es 


PENNSYLVANIA Strate COLLEGE, 
Strate Couieae, Pa. 





f 


tr, 


THE TEACHING OF STATISTICS! 
By Haroip Hotre.iine 


The very great increase in the teaching of statistics since the First World 
War has been associated on one hand with the development of statistical theory. 
This important series of discoveries has made available more and more power- 
ful and accurate statistical methods, and has also acquired an intellectual 
interest of its own as embodying the modern version of the most important 
part of inductive logic and as providing scope for mathematical and logical 
ingenuity of high order. The increased teaching of statistics has also been 
associated with the rapidly growing applications of statistics in innumerable 
fields, made possible by the development of the theory, by the availability of 
persons having some knowledge of the theory, and by an increasing realization 
of the possibilities of application. Doubtless most students of statistics enter 
upon the subject, not for its intrinsic interest, but with the idea of applying 
statistical methods as a tool to some particular end. This object may be 
scientific research, or to fulfill a requirement for a degree, but is often connected 
with some purely practical pursuit offering the ready prospect of a remunerative 
job. But it would be a mistake to ignore those whose interest is more purely 
intellectual, who desire an insight into the peculiar problems of probable in- 
ference and the structure of empirical knowledge, who wish to get a fundamental 
acquaintance with one of the most fundamental of subjects, to see and under- 
stand fully the mathematical derivations underlying so much practical and 
scientific activity, and perhaps to make their own contributions. 

Of the magnitude of the demand for statisticians there can be no doubt. 
The realization of what statistical methods can do in a multitude of fields has 
gradually led the administrators of government agencies, directors of scientific 
organizations and research institutes, and business men, to employ rapidly 
increasing numbers of persons with some knowledge of statistical methods, and 
to accord an unusual degree of recognition and promotion in many such cases. 
The uses of statistical methods, and especially of sampling theory, are so varied 
that it is scarcely possible in a brief space to give any sort of survey of them. 
They enter, in one form or another, into the research work of the physicist, the 
chemist, the astronomer, the biologist, the psychologist, the anthropologist, 
the medical investigator, the economist, and the sociologist. Meteorology, 
which has lately acquired greatly increased importance, both civil and military, 
is with its masses of numerical observations very much a statistical matter. 
The engineer needs modern statistical methods both in the physical and in the 








' Address at the meeting of the Institute of Mathematical Statistics at Hanover, N. H., 
September 10, 1940. 


457 





458 HAROLD HOTELLING 


economic aspects of his plans. The work of W. A. Shewhart has made clear 
the central importance of sampling theory in the economic control of quality 
of manufactured articles. Business men who use sampling surveys to test 
the markets for their products and the effectiveness of their advertising, who 
employ statisticians to make up index numbers and forecasts of business condi- 
tions, and whose manufacturing costs and quality are controlled with the 
help of recently devised statistical methods, are finding more and more uses for 
statisticians. Indeed, it seems as if the exploitation of the business and manv- 
facturing possibilities of statistical methods has only begun, and that limitless 
further fields are coming into view. Insurance has of course always been essen- 
tially dependent on statistics. 

But the most rapidly growing large _ of positions for statisticians is at 
present in governmental activities. For some facts regarding the employment 
of statisticians by the federal government I am indebted to Dr. J. M. Thomp- 
son. It appears that it has about one hundred agencies using statistics, with 
almost eight hundred positions broadly classified as statistical or mathematical, 
in addition to more than six thousand generally classified as economists. The 
title ‘“economist”’ covers many types of work, but much of it is largely statis- 
tical. The nature of the government’s statistical work is varied and extensive. 
It includes such work as forecasting revenue from taxes, prices and production 
of agricultural commodities, general demand conditions, and weather. Some 
of the work consists in analyzing the effects of various taxes on other programs. 
In connection with proposed legislation, statisticians serving the lawmakers 
often attempt to outline the probable results of the legislation, as well as to 
assist in setting up definite formulae for carrying out the general policies aimed 
at in Acts of Congress. Administrators as well as lawmakers require statistical 
activities of a high order, exemplified in the Bureau of the Census, the Bureau 
of Agricultural Economics, and others. The scientific activities of the govern- 
ment, the work of the War Department, and many others that do not at first 
sight appear at all statistical, require the services of mathematical statisticians 
of high order. Even the judicial activities call for statistical theory of some 
of the most recently discovered kinds, as for instance in the investigation re- 
cently made of parole procedures. Cities and states, school and port authori- 
ties, employ numerous statisticians for other and widely diverse purposes. 

The growing need, demand and opportunity have confronted the educational 
system of the country with a series of problems regarding the teaching of statis- 
tics. Should statistics be taught in the department of agriculture, anthro- 
pology, astronomy, biology, business, economics, education, engineering, 
medicine, physics, political science, psychology, or sociology, or in all these 
departments? Should its teaching be entrusted to the department of mathe 
matics, or to a separate department of statistics, and in either of these cases 
should other departments be prohibited from offering duplicating courses in 
statistics, as they are often inclined to do? To what students, and at what 
stage of their advancement, should a course in statistics be administered? 




















ar 
ity 
est 
Tho 
di- 
the 
for 
nu- 
less 
en- 


; at 
ent 
np- 
vith 
cal, 
The 
itis- 
ive, 
tion 
ome 


THE TEACHING OF STATISTICS 459 





Should there be mathematical or other prerequisites? How much of an in- 
vestment in a statistical laboratory is warranted? Should courses be primarily 
theoretical and mathematical, or should they be made as practical as possible, 
equipping the student in the shortest possible time for a job as statistician, or 
for statistical work in the field with which a particular department is con- 
cerned? What about degrees in statistics? LEclipsing all these in importance, 
though it seems to have received too little of the attention of college and uni- 
versity administrative officers is the question, What sort of persons should be 
appointed to teach statistics? 

To pressing practical problems answers are sure to be given either by con- 
sidered policy or by processes of historical evolution. The latter are the more 
prominent in explaining the statistical teaching we have had. A synoptic 
picture of the origins, not many decades ago, of a good deal of it would perhaps 
be something like this. A university Department of X, where X stands for 
economics, psychology, or any one of numerous other fields, begins to note 
toward the end of the pre-statistical era that some of the outstanding work 
in its field involves statistics. The quantity and importance of such work are 
observed to increase, while at the same time its intelligibility seems to diminish. 
Evidently students turned out with degrees in the field of X who do not know 
something about statistics are going to be handicapped, and are not likely to 
reflect credit on Alma Mater. The department therefore resolves that its 
students must acquire at least an elementary knowledge of the fundamentals 
of statistics. To implement this principle, it perhaps inserts some acquaint- 
ance with statistics among the requirements for a degree. This situation 
naturally calls for the introduction of a course in statistics. Accordingly the 
head of the Department of X, in preparing the next Announcement of Courses, 
writes: 























“X 82. Elements of Statistics. An elementary but thorough 
course designed to acquaint students of X with the fundamental con- 
cepts of statistics and their applications in the field of X. The view- 
point will be practical throughout. Second semester, MWF at 10. 

“Instructor to be announced.” 


The problem now arises of finding someone to teach the new course. The 
few well-known statisticians in the country have positions elsewhere from which 
it would be impossible to dislodge them with the bait to be offered; for though 
the department wishes to have statistics taught as an auxiliary to the study of 
X, it feels that there must be no question of the tail wagging the dog, and that 
economy is appropriate in this connection. The members of the department 
of professorial rank Go not respond favorably to the suggestion that they should 
themselves undertake to teach the new and unfamiliar course. But every 
university department has a bright graduate student whose placement is an 
immediate problem. Young Jones has already demonstrated a quantitative turn 
of mind in the course on Money and Banking, or in the Ph.D. thesis on which 


460 HAROLD HOTELLING 


he has already made substantial progress, dealing with The Proportion of 
Public School Yard Areas Surfaced with Gravel. He may even recall having 
had a high-school course in trigonometry. His personality is all that might 
be desired. He is a white, Protestant, native-born American. And so. the 
“Instructor to be announced” materializes as Jones. 

This earnest young scholar now finds that, in addition to completing his 
thesis, he must look up the literature of statistics and prepare a course in the 
subject. His attention is directed by older members of the department to 
some of the research papers in the field of X involving statistics. He pursues 
“statistics” through the library card catalog and the encyclopedias. He reads 
about census and vital statistics, price statistics, statistical mechanics. Per- 
haps he encounters probable errors. Eventually he learns that Karl Pearson 
is the great man of statistics, and that Biometrika is the central source of infor- 
mation. Unfortunately most of the papers in Biometrika and of Pearson’s 
writings, while not lacking in vigor, trail off into mathematical discourse of a 
kind with which young Jones feels ill at ease. What he wants is a textbook, 
couched in simple language and omitting all mathematics, to make the subject 
clear to a beginner. Perhaps he finds the impressive books of Yule and Bowley, 
but decides that they are too abstruse. Elderton’s ‘Frequency Curves and 
Correlation” is far too mathematical. Jones decides that a simple book on 
statistics must be written, and that he will do it if he can ever succeed in master- 
ing the subject. In the meantime, he contents himself perforce with the less 
mathematical writings of Karl Pearson, with applied examples in the field of X, 
and with such nonmathematical textbooks as may have been written by other 
young men who have earlier trod the same path as that on which Jones is now 
beginning. Somehow or other he gets the class through the course. After 
doing this two or three times, Jones is an experienced teacher of statistics, and 
his services are much in demand. His course expands, takes on a settled form, 
and after a while crystallizes into a textbook. At the same time he may be 
getting out some research, consisting of studies in the field of X in which statis- 
tical methods play a part. His promotion is rapid. He becomes a Professor 
of Statistics, and perhaps an officer in a national association. His textbook 
has a large sale, and is used as a source by other young men writing textbooks 
on statistics. 

The textbooks written in this way form an interesting literary cycle. Meas- 
ures of “central tendency” and of dispersion are introduced, and the use of 
one as against another of these measures is debated on every ground except 
the criterion that modern research has shown to be the important one, the 
sampling stability. Sampling considerations, indeed, get little attention. 
The urge to simplify by leaving out the more difficult parts of the subject, and 
especially the mathematical parts, is accompanied by pride in the great number 
of examples drawn from real life, that is, actual data that have been collected. 

But the most fascinating feature of this literary cycle is the opportunity it 
offers for research by the standard methods of literary investigation, tracing the 









THE TEACHING OF STATISTICS 461 












influence of one author upon another through parallelism of passages, and so 
forth. .This study is facilitated by the accumulation of errors with repeated 
copying. One outstanding example is in certain formulae connected with the 
rank correlation coefficient, derived originally by Karl Pearson in 1907 and 
copied from textbook to textbook without adequate checking back. As one 
error after another was introduced in this process, the formulae presented to 
students (and apparently made the basis of class exercises involving numerical 
substitution) became less and less like Pearson’s original equations. Inci- 
dentally, in trying to check this original work of Pearson’s, recent investigation 
has raised the suspicion that it is erroneous; at any rate, he does not give a fully 
adequate argument. Thus it may be that the errors in copying, which are so 
useful in examining the history of statistics, never did any harm. The formulae 
in which the students were drilled may have been no worse than they would 
have been if all the copying had been done with more care. 

While this process has been going on in the Department of X, the Y and Z 
Departments have likewise evolved the teaching of statistics. There is some 
interchange of ideas between the various statisticians on the campus, and there 
is a catholicity in the copying of textbooks. But by and large, statistics is 
regarded in the Economics Department as a branch of economics, in the Psy- 
chology Department as a part of psychology, and so forth. The astronomer is 
inclined to resent the suggestion that his students should be called upon to study 
their least squares with anyone but an astronomer. Medical and biological 
investigators suspect Economics and Psychology of charlatanry, and do not 
look with favor on the idea of turning their own students over to such depart- 
ments for instruction in statistics. Most unthinkable of all would be putting 
the Department of Education in charge of an essential part of the training of 
scientific students. Thus the courses multiply. 

The fact that it is essentially the same fundamental subject that is being 
taught under various names and with various kinds of notation in different 
departments is often concealed by including the teaching of statistical theory 
in a course whose title and prospectus are more suggestive of applications. A 
case in point is that of an economist of my acquaintance, not primarily engaged 
in teaching, who some years ago was invited to give a course in Price Forecasting 
in the Economics Department of a leading university. He carefully prepared a 
series of lectures on this subject, which had been the center of some extended 
research he had conducted. A large class enrolled for the course. But soon 
after beginning his series of lectures the economist noticed that the class was 
growing restive. Upon inquiring what was amiss, he learned that his discourse 
was unintelligible to many of them because he was using technical statistical 
terms and concepts with which they were not familiar. He thereupon under- 
took to use simpler language, and when this did not suffice to convey his mean- 
ing, to explain the statistical notions involved in his work on price forecasting. 
More and more his lectures came to deal with the elements of statistics, and less 
and less with price forecasting. At the end of the term he felt that he had 




























462 HAROLD HOTELLING 


given the students some elementary knowledge of statistical theory, for which 
they had not enrolled and for which he did not feel particularly well qualified, 
but had taught them virtually nothing about price forecasting. When the 
invitation was repeated the next year, the economist suggested imposing a course 
in statistics as a prerequisite for the course in Price Forecasting. Thishowever 
was vetoed by the head of the Economics Department, who did not believe in 
prerequisites. The Price Forecasting course was not repeated. 

This incident illustrates the evolution of a good deal of statistical teaching. 
At the beginning, the idea is to teach some application, but the teacher soon 
finds himself engaged at much more length than expected with the fundamentals 
of statistical theory and methods. In this way it has come about that a large 
number of persons are teaching theoretical statistics who initially had no inten- 
tion of doing so, but were concerned with particular applications. The teach- 
ing of statistical theory has been undertaken belatedly and inexpertly because 
it was necessary to a discussion of some application originally in view. Thus 
it happens that a good deal of teaching of statistics, even of mathematical 
statistics, masquerades as something else. 

The obvious inefficiency of overlapping and duplicating courses given inde- 
pendently in numerous departments by persons who are not really specialists 
in the subject leads to the suggestion that the whole matter be taken over by the 
Department of Mathematics. This is a promising solution, but it is doomed to 
failure if, as has sometimes happened, it means that the teaching of statistics 
is put under the jurisdiction of those who have no real interest in it. Moreover 
the teaching of statistics cannot be done appreciably better by mathematicians 
ignorant of the subject than by psychologists or agricultural experimenters 
ignorant of the subject. The latter indeed have a certain advantage in that the 
problems seem more real and definite to them; they can sense the difference 
between the important and the unimportant questions, even if they cannot 
express the questions in clear mathematical language, and can sometimes arrive 
intuitively at a correct result that leaves the mathematician puzzled. Also, 
they can understand more readily than can the mathematician the examples, 
drawn largely from biological material, which play so important a part in some 
of the leading expository work on statistics, such as R. A. Fisher’s Statistical 
Methods for Research Workers. The pure mathematician has only one advan- 
tage over the non-mathematical worker in empirical fields: he is able to set about 
reading the serious literature of statistical theory. But he must still find this 
scattered literature, sort it out from a mass of rubbish, fallacies, and false starts, 
and trace it back historically until he can understand the notation and the pre- 
suppositions. He must also contend with the fact that a good deal that is im- 
portant in statistics is still a matter of oral tradition, and some consists of lab- 
oratory techniques. In short, he needs a teacher before he himself sets out to 
teach the subject. When a Department of Mathematics calls in a young Ph.D., 
however brilliant, to teach statistics as a part or all of his program, the best 
thing it can do, if he has not already had a training in modern statistics, is to 





THE TEACHING OF STATISTICS 463 


give him a furlough for a year or two to enable him to go where he can acquire 
such a training. 

Qualifications of a good teacher of statistics include, first and foremost, a 
thorough knowledge of the subject. This statement seems trivial, but it has 
been ignored in such a way as to bring about the present unfortunate situation. 
Mathematicians and others, who deplore the tendency of Schools of Education 
to turn loose on the world teachers who have not specialized in the subjects they 
are to teach, would do well to consider their own tendency to entrust the teach- 
ing of statistics to persons who not only have not specialized in the subject, 
but have no sound knowledge of it whatever. A knowledge of theoretical 
statistics is not easy to obtain. There is no comprehensive treatise on the sub- 
ject, starting from first principles, and proceeding by sound deductions and 
well-chosen definitions to the methods that need to be used in practice. (I 
have been trying for years to write such a treatise, but it has turned out to be a 
bigger task than at first appeared. This is partly because some things formerly 
thought to have been proved turn out, on critical examination, not to be sound, 
and much new research has been necessary.) The literature is scattered through 
journals pertaining primarily to many kinds of applications, and it is only in 
recent years that any large proportion of the current contributions to statistical 
theory and methods have been gathered into a few periodicals devoted to sta- 
tistical theory. On the other hand, the seeker after truth regarding statistical 
theory must make his way through or around an enormous amount of trash 
and downright error. The great accumulation of published writings on statis- 
tical theory and methods by authors who have not sufficiently studied the sub- 
ject is even more dangerous than the classroom teaching by the same people. 

A good teacher of statistics needs of course a mathematical background, in- 
cluding at least an acquaintance with the theory of functions and n-dimensional 
euclidean geometry. A good deal of additional algebra and analysis are likely 
to be helpful, as well as some differential geometry. But no amount of such 
mathematics constitutes by itself any approach to sufficiency in the qualifica- 
tions of a teacher of statistics. The most essential thing is that the man shall 
know the theory of statistics itself thoroughly from the ground up, including 
the mathematical derivations of proper methods and a clear knowledge of how 
to apply them in various empirical fields. In addition to the pure mathematics 
and the knowledge of statistical theory, a competent statistician or teacher of 
statistics needs a really intimate acquaintance with the problems of one or more 
empirical subjects in which statistical methods are applied. This is quite im- 
portant. Sometimes excellent mathematicians have wasted time and misled 
students through failure to get that feeling for applications that is necessary for 
proper statistical work. 

The theory of statistics has been making advances so rapid and so fundamental 
that some of the first things that need to be said in an elementary course, even 
for prospective practical statisticians, are affected by some of the most recent 
researches. So elementary a question as. ‘“What definition is it wise to give to 





464 HAROLD HOTELLING 


the term ‘standard deviation’?”’, which must be faced by every teacher of 
Statistics 1, requires for an intelligent answer a rather thorough understanding 
of modern sampling theory and techniques. The answer, it now seems, is 
not the definition given in most textbooks. In the selection of a statistic to 
represent a parameter, for example in fitting frequency curves or in linkage 
estimation in genetics, the fundamental consideration is connected with the 
sampling distribution, as R. A. Fisher showed in founding the modern theory of 
estimation. This is ignored in most of the current teaching of statistics, with 
the result that innumerable students are sent out to waste the money and time 
of their employers by demanding larger samples than are necessary for the pur- 
poses in view, wasting costly informaticn by calculating inefficient statistics 
and using tests that are not the most powerful. On the other hand, students of 
statistics who are taught rule-of-thumb methods without their derivations are 
never quite conscious of the exact limitations and assumptions involved, and 
may make unwarranted inferences from samples that are too small or in some 
way violate the conditions underlying the derivations of the formulae. 

A good teacher of statistics must be thoroughly familiar with these recent 
advances. He must examine very critically textbook statements unsupported 
by full proofs. Even though the students are not capable of following the 
complete mathematical argument—indeed, especially if the students are not to 
examine it—the instructor needs to give it a critical study. The custom of 
omitting proofs, which would not be tolerated in pure mathematics beyond 
a very limited extent, is common in the teaching of statistics, and is excused on 
the ground that the students do not know enough mathematics to understand 
the proofs. Perhaps in some cases a better reason is that the teachers, and the 
authors of the textbooks, do not understand the proofs. In some instances 
no proofs exist, and in some instances no genuine proofs can exist, because the 
methods taught are demonstrably wrong. The custom prevalent in the teach- 
ing of mathematics of going over each proof carefully in the class is, among other 
things, a safeguard against infiltration of false propositions. This safeguard is 
missing from most of the teaching of statistics, and there has been an infiltration 
of errors. Since it is accepted that a great many students need to learn some- 
thing about statistical methods without learning enough mathematics to under- 
stand the proofs, it follows that the elementary teaching of statistics to these 
students must, if the perpetuation of gross errors is to be avoided, be in the 
hands of really competent mathematical statisticians. This is perhaps the 
greatest reform needed in the teaching of statistics today. Until the elementary 
teaching of statistics is conducted by those with a thorough and critical knowl- 
edge of current research in statistical theory, of a sort that seems virtually 
inseparable from participation in that research, there is likely to be a continua- 
tion of the laborious drilling of thousands of students in methods that ought 
never to be used. Here, of all places, is the great need for participation of 
research workers in elementary teaching. 

Teachers and textbook writers might well abandon the idea of telling what 





THE TEACHING OF STATISTICS 465 


statistical methods are used, and say instead what methods ought to be used. 
But before they can do this with confidence they must have a very close ac- 
quaintance with the research of the last three decades in statistical theory. 

How can an appointing officer know whether a prospective teacher of statistics 
knows his subject? This question requires no answer peculiar to statistics in 
distinction from other subjects. Publication of research, constituting a contri- 
bution to the particular field, has always been accepted as the best proof, A 
substantial contribution to fundamental statistical theory, which is to be dis- 
tinguished from the mere application of known statistical methods to empirical 
data, is the best indication of the kind of scholarship appropriate to a teacher of 
statistics. 

Participation in research is not novel as a criterion of what constitutes a good 
teacher of a college or university subject, if the subject is Greek literature, 
physics, chemistry, biology, or indeed any of those departments that have been 
long enough established to attain with respect to the organization of their teach- 
ing a state approximating equilibrium. The more reputable institutions of 
higher learning have long maintained the principle, though with occasional 
violations in practice, that the Ph.D: degree or its equivalent, representing among 
other things the completion of a piece of scholarly research, is a minimum 
condition for a regular faculty appointment. It has usually been maintained 
also that the Ph.D. thesis should be a new contribution of a strictly scholarly 
character to the field of the scholar’s competence, and not merely a routine 
application of known methods to an extraneous field. Thus a thesis offered for 
the Ph.D. degree in mathematics would be judged by its contribution to mathe- 
matics, rather than to physics or accounting. Moreover the regard in which 
universities have held members of their faculties has been intimately connected 
with their output of scholarly research. Other criteria of excellence have not 
been ignored, but research has been recognized in a fairly consistent manner. 
Some say that there has been an over-emphasis on research, and that more at- 
tention ought to be given to other qualities related to teaching. However 
this may be, the facts remain that scholarly research is something capable 
of a reasonably objective evaluation by scholars in the field, that it offers the 
main hope of fundamental progress, and that familiarity with current research 
is a necessary, though not sufficient, condition for the most important teaching 
in institutions of higher learning. 

A peculiarity of the teaching of statistics, of which in practice the theory of 
statistics is an essential even if unacknowledged part, is that a good deal of it 
has been conducted by persons engaged in research, not of a kind contributing to 
statistical theory, but consisting of the application of statistical methods and 
theory to something else. A similar situation would exist if the teaching of 
mathematics were in the hands of an assortment of various kinds of engineers, or 
if zoology and botany were taught by practicing physicians. The teaching of 
mathematics and of elementary biology might perhaps gain in liveliness and 
concreteness by such arrangements, with the accompanying emphasis on the 








466 HAROLD HOTELLING 


particular applications of the fundamental sciences. Moreover the engineer 
might in the course of such teaching refresh his own knowledge of elementary 
mathematics, while the physician might gain by renewing his acquaintance with 
elementary biology. Such arrangements might occasionally be made with 
profit. But if they were the general rule the advantages of specialization would 
be lost; the fundamental sciences would not be developed in so well-rounded a 
manner as they are by specialists in them, while the special skills and knowledge 
of the physician and engineer could not be utilized to the full in their respective 
professions. Statistical theory is a big enough thing in itself to absorb the full- 
time attention of a specialist teaching it, without his going out into applications 
too freely. Some attention to applications is indeed valuable, and perhaps 
even indispensable as a stage in the training of a teacher of statistics and as a 
continuing interest. But particular applications should not dominate the 
teaching of the fundamental science, any more than particular diseases should 
dominate the teaching of anatomy and bacteriology to pre-medical students. 
These subjects are not ordinarily taught by practicing physicians, but by anat- 
omists and bacteriologists respectively. 

In medical education the principle has been accepted, after a long struggle, 
that a medical school should have full-time professors engaged primarily in 
teaching and research, and that such professors should not treat patients except 
in cases of unusual interest from the standpoint of the science or art of medicine. 
An analogous principle would be that an institution offering extensive instruc- 
tion in statistics should have full-time professors engaged in the teaching of and 
research in statistical theory and methods, without spending time over applied 
statistical problems excepting insofar as such problems might present novel 
features calling for the development of new statistical methods or theoretical 
extensions having interest going beyond the immediate case. Sometimes the 
complaint is heard in medical schools that the teaching tends to become too 
theoretical on account of detachment from clinical practice, and a similar diffi- 
culty might conceivably develop in connection with statistics; but in neither 
case does the trouble seem to be beyond the ability of the personnel involved to 
cure if they have the right background. 

A specialist in statistics on a university faculty has a threefold function. In 
addition to the usual duties of teaching and research, there is a need for him to 
advise his colleagues, and other research workers, regarding the statistical 
methods appropriate to their various investigations. The advisory function is 
a highly important one for the activities of the university as a whole, and should 
be taken into consideration in adjusting the teaching load. Probably every 
university statistician is visited from time to time by earnest research workers, 
deeply engrossed in their respective specialities, speaking technical jargons un- 
familiar to the statistician, and seeking his advice on matters concerning which 
he has a sinking feeling of lack of comprehension. After some hours of psycho- 
analyzing his visitor the statistician may be able to ascertain what it is he really 
wants to know, and thereafter either refer him to some standard formula, or 








a a eae a 











' ~~ wa Pe wv 


“SS = ' -™ 





THE TEACHING OF STATISTICS 467 


more often, undertake a piece of new mathematical research designed to fit 
the particular problem, and very possibly having value also for.a more extended 
class of problems. The statistician is then very likely to find himself embarked 
on a co-operative research venture in a field that is new to him. 

To function well in this third, the consultative or co-operative function, he 
must have an unusually large store of general information. No one stands in 
greater need than he of that knowledge of “something about everything and 
everything about something” that was once said to be the goal of a liberal 
education. In planning the education of statisticians and teachers of statistics 
these considerations point to a somewhat wider diffusion of studies among vari- 
ous fields than is customary in many institutions, especially in graduate work. 
The co-operation, and their other work, would also be facilitated if research 
workers in general were more strongly urged to get a training in mathematical 
statistics at an early stage in their careers. 

The problem of departmental organization is secondary to that of getting men 
having the requisite qualities of extensive mathematical preparation, a thorough 
knowledge of modern theoretical statistics, an understanding of some fields at 
least in which statistical methods can be applied, and the type of inquiring 
mind sometimes described as a “research outlook.’”’ A Department of Mathe- 
matics may well handle the fundamental teaching in statistics, provided it has 
men properly qualified for such teaching. If it does not have such men, its 
teaching of statistics and its inability to provide the needed statistical advice 
will inevitably tempt the other departments to set up again their own duplicat- 
ing courses in what amounts essentially to statistical theory and methods, and 
to repeat the mistakes of the past. 

A separate Department of Statistics, if competently staffed, could very well 
provide advice for the whole institution as well as conducting elementary in- 
struction in statistical methods and theory, both for students having calculus 
and for those without it, and should certainly carry on advanced teaching and 
research in statistical theory and methods. But for efficient functioning of the 
institution as a whole it should be agreed that the Department of Statistics or 
the Department of Mathematics should do all the elementary instruction in 
statistics, and that courses in statistics in other departments should be confined 
to applications of the basic theory. Normally such courses in applied statistics 
in the other departments should require as a prerequisite one or more of the basic 
courses in the Department of Statistics, or of Mathematics. The basic course 
to be required as a prerequisite to others should be the one which itself requires 
calculus as a prerequisite wherever this is practicable. It is practicable for 
students of engineering, physics, astronomy, and mathematical economics, since 
these students must have calculus anyhow. Moreover the value of the se- 
quence consisting of calculus, statistical theory and applied statistics, in this 
order, is so great that many other students are likely to avail themselves of it 
when it is once established and the true nature and value of statistics are more 
widely understood. 








468 HAROLD HOTELLING 


Exactly how far a Department of Statistics should go in particular applica- 
tions would nave to be decided anew from time to time by its members in the 
light of changing conditions and interests. It cannot teach everything that goes 
by the name of statistics. This problem may be exemplified by the case of 
population and vital statistics. This is a field with close connections with so- 
ciology, biology, medicine and insurance. It is cultivated in conjunction with 
each of these subjects in various places. Some of its most interesting and im- 
portant phases make use of quite advanced mathematics, as in the work of 
A. J. Lotka, and in addition there is extensive use, and more extensive need, of 
the statistical methods centered around sampling theory which are the appro- 
priate domain of a Department of Statistics. Should the study of population 
and vital statistics be included in a Department of Statistics? I think not, 
except as a temporary arrangement, or in a small institution, in spite of the 
history of the word “‘statistics,’’ which originated in connection with material 
of this kind, and in one of its meanings is still applied to it. (My use of the 
unqualified word “statistics” in this paper is in the sense of theory and methods, 
not in the sense of statistical facts such as those found by the census.) Medical, 
biological and sociological considerations are prominent in the problems of vital 
statistics, and one of these departments might well handle the subject. But 
the vital statistician, like other research workers, should have acquired in the 
course of his training an intimate familiarity with the statistical theory and 
methods which are the appropriate province of a Department of Statistics. 
He also needs mathematics through integral equations, if he is to understand and 
extend the contributions of Lotka and Volterra. Students of vital statistics 
should have had an elementary course in statistical theory in the Department of 
Statistics, preferably the course requiring calculus. 

A course in price statistics should be taught by an economist, presumably in 
the Department of Economics, but might well require as a prerequisite the same 
elementary courses in statistical theory and methods as would be required in 
psychology, medicine and other fields. In addition, there are problems of time 
series analysis whose treatment calls for a mathematical statistician having some 
acquaintance with both economic and meteorological data. A course on the 
treatment of time series might appropriately be included in the Department of 
Statistics, requiring the general elementary course as a prerequisite, and itself 
serving as a prerequisite for courses in economic and meteorological statistics. 

One of the chief obstacles to efficient organization of teaching is the habit of 
not prescribing prerequisites outside one’s own department. But when once 
the elementary courses in statistics have become established in the hands of well- 
equipped specialists in statistical theory and methods, in whose competence 
general confidence can be reposed, the various departments of application will 
lose their motive for establishing their own duplicating courses, and will be able 
to cultivate more intensively their respective specialities. 

The detection of biases and the details of practical statistical work vary greatly 











THE TEACHING OF STATISTICS 469 


from one application to another. These, consequently, are matters for the de- 
partments concerned with applications rather than with the fundamentals of 
statistics, and should not be the chief features of a course in elementary statis- 
tical methods and theory. The work of a Department of Statistics should be 
concerned largely with sampling theory, and should emphasize the unity of 
statistical methods and theory, regardless of the field of application. It should 
deal with statistics as a coherent science of inductive inference, of the prepara- 
tion of observations for inference, and of the planning of investigations so as to” 
yield observations from which inferences can best be made. 

The question what mathematical prerequisites should be established for the 
fundamental course in statistical theory must be answered by a compromise 
between the ideal and what is expedient at a particular time and place. In 
Europe a large number of students have had a year of calculus before coming to 
universities, that is, before reaching the age of eighteen. If a university were 
willing to restrict ite entrants to such students (thus automatically solving the 
problem of overcrowding) it could give them another year of calculus, mixed 
perhaps with advanced algebra.and geometry, and then in their sophomore year 
give them a thorough course in elementary statistics and probability, based on 
calculus. These students would then be ready to tackle advanced statistics in 
the third year in a really effective way. If the teaching of economic theory, 
physics, chemistry and astronomy were geared to this program in such a way as 
to make real use of the calculus, the work in these subjects could be made far 
more efficient, in the sense that more material could be covered effectively in 
the allotted time, or an equivalent amount of material in less time. If, in addi- 
tion, all the many departments in which statistical methods and theory are used 
required these statistical courses as prerequisites, and actually used the mate- 
rials of these courses in their work, there would be a further huge gain in effi- 
ciency. The baccalaureate degree of such an institution would represent a far 
more thorough knowledge, and command of the tools of research, than is possible 
without an arrangement putting in this way the fundamentais first. 

Institutions unwilling to undertake such a drastic improvement must face 
more or less delay and inadequacy in the acquisition by their students of the 
fundamentals of mathematics and of statistics. A division of the students into 
groups according to mathematical ability ought to be undertaken, and followed 
by a corresponding division of the elementary statistics course. Students having 
high mathematical ability could begin the study of statistics after completing 
calculus, and could look forward to rising ultimately to greater heights in pur- 
suits involving mathematical or statistical knowledge than those of lesser mathe- 
matical talents. For these latter there would still be the possibility of acquir- 
ing, even without calculus, useful statistical tools; but it is essential that this 
should be done under the guidance of instructors thoroughly familiar with the 
mathematics of statistics. The task of leading the blind must not be turned 
over to the blind. Students possessing the ability to master the calculus should 





470 HAROLD HOTELLING 


be encouraged to begin the study of statistics with the course having calculus 
as a prerequisite, and should not be put into the necessarily slower group not 
having the calculus. I believe that these elementary courses should begin with 
the theory of probability, but should go on to the chief distribution functions 
used in practice, and should include applied problems and work on calculating 
machines. 

Putting a sound program of statistical teaching into effect will take time, 
partly because of the scarcity of suitable teachers of statistics. Nevertheless, 
the process is well under way, and the prospects are good for substantial im- 
provements in the teaching of statistics. A body of able young research men 
possessing the requisite knowledge of statistical fundamentals is now in existence 
and is growing. Some of the recent textbooks represent striking improvements. 
The Institute of Mathematical Statistics itself, with the Annals of Mathematical 
Statistics, is perhaps the best evidence of a changed view making for better 
things. 

Co.umBia UNIVERSITY, 
New York, N. Y. 


DISCUSSION OF PROFESSOR HOTELLING’S PAPER 
By W. Epwarps DEMING 


It is a pleasure to endorse Professor Hotelling’s recommendations; in fact we 
have been following them pretty closely in the courses in the Graduate School 
of the Department of Agriculture. As a matter of fact, he has indirectly played 
an influential part in building up this set of courses, because some of our best 
instructors are his former students. 

Listening to Professor Hotelling’s paper, I was thinking of the possibility 
that some of his recommendations might be misunderstood. I take it that they 
are not supposed to embody all that there is in the teaching of statistics, because 
there are many other neglected phases that ought to be stressed. In the Bureau 
of the Census the population division alone has augmented its force by ap- 
proximately 3500 statistical clerks during the past six months. They come from 
diverse schools and it has been interesting to observe how many of them have the 
idea that all the problems of sampling and inference from data can be solved by 
what are commonly known as modern statistical techniques—correlation co- 
efficients, rank correlation coefficients, chi-square, analysis of variance, con- 
fidence limits, and the like. Most of them are shocked to learn that many of 
the so-called modern “theories of estimation” are not theories of estimation at 
all, but are rather theories of distribution and are a disappointment to one whois 
faced with the necessity of making a prediction from his data, i.e., of basing 





THE TEACHING OF STATISTICS 471 


some critical course of action on them. The conviction that such. devices as 
confidence limits and Student’s ¢ provide a basis for action regardless of the 
size of the sample whence they were computed, even under conditions of statis- 
tical control, is too common a fallacy. On the other hand, many simple but 
worthy devices are neglected. A histogram, for instance, can be a genuine 
tool of prediction if it is built up layer by layer in different legends so as to dis- 
tinguish the different sources whence the data are derived. The modern student, 
and too often his teacher, overlook the fact that such a simple thing as a scatter 
diagram is a more important tool of prediction than the correlation coefficient, 
especially if the points are labeled so as to distinguish the different sources of the 
data. Most students do not realize that for purposes of prediction the con- 
sistency or lack of it between many small samples may be much more valuable 
than any probability calculations that can be made from them or from the entire 
lot. Students are not usually admonished against grouping data from heterog- 
eneous sources. Of those that are not guilty of indiscriminate grouping, many 
are inclined to rely on statistical tests for distinguishing heterogeneity, rather 
than on a careful consideration of the sources of the data. Too little attention 
is given to the need for statistical control, or to put it more pertinently, since 
statistical control (randomness) is so rarely found, too little attention is given 
to the interpretation of data that arise from conditions not in statistical control. 

Nevertheless, the fundamentals of probability and sampling theory, and the 
mathematics of the distribution functions, though by themselves they do not 
qualify anyone for high-grade statistical work, are ultimately essential for pro- 
ficiency in statistics. Since they are seldom learned away from the university 
they are properly made the main theme of teaching. The university is the 
place to learn the studies that are so difficult to get outside of it. 

Above all, a statistician must be a scientist. The skepticism of many first 
class scientists of today for modern statistical methods should be a challenge to 
statistical teaching. A scientist does not neglect any pertinent information, 
yet students of statistics are often taught to do just the opposite of this, and are 
accused of being old-fashioned for daring to think of combining experience with 
the new information provided by a sample, even if it is a pitifully small one. 
Statisticians must be trained to do more than to feed numbers into the. mill and 
grind out probabilities; they must look carefully at the data, and take account” 
of the conditions under which each observation arises. It is.my feeling that’ : 
the chief duty of a statistician is to help design experiments in sucha way 
that they provide the maximum knowledge for purposes of prediction ; another 
is to compile data with the same object in view;-and still a third function is 
to help bring about some changes in the source of the data. Scientific data 
are not taken merely for inventory purposes. There is no use taking data if 
you don’t intend to do something about the sources whence they arise. 


BUREAU OF THE CENSUS, 
WASHINGTON 





HAROLD HOTELLING 


RESOLUTIONS ON THE TEACHING OF STATISTICS 


The Institute of Mathematical Statistics at its business meeting on September 
11, 1940 at Dartmouth College adopted the following resolutions regarding the 
teaching of statistics. The resolutions were drawn up by a committee appointed 
by the President, and consisting of Burton H. Camp, W. Edwards Deming, 
Harold Hotelling; and Jerzy Neyman. 

1. If the teaching of statistical theory and methods is to be satisfactory, it 
should be in the hands of persons who have made comprehensive studies of the 
mathematical theory of statistics, and who have been in active contact with 
applications in one or more fields. 

2. The judgment of the adequacy of a teacher’s knowledge of statistical 
theory must rest initially on his published contributions to statistical theory, in 
contrast with mere applications, in a manner analogous to that long accepted in 
other university subjects. 

3. These ideas are expressed in detail in the paper The teaching of statistics, 
by Professor Harold Hotelling, and the Institute decides to give both the 
resolution and the paper as wide a circulation as possible. 





| 





REPORT OF THE HANOVER MEETING OF THE INSTITUTE 


The sixth meeting of the Institute of Mathematical Statistics was held at 
Dartmouth College, Hanover, New Hampshire, Tuesday to Thursday, Sep- 
tember 10 to 12, 1940, in conjunction with meetings of the American Mathe- 
matical Society and of the Mathematical Association of America. The fol- 
lowing forty-two members of the Institute attended the meeting: 


H. E. Arnold, Felix Bernstein, G. W. Brown, J. H. Bushey, B. H. Camp, A. T. Craig, 
A. R. Crathorne, J. H. Curtiss, J. F. Daly, W. E. Deming, J. L. Doob, Churchill Eisenhart, 
M. L. Elveback, C. H. Fischer, M. M. Flood, R. M. Foster, T. C. Fry, H. P. Geiringer, 
Robert Henderson, E. H. C. Hildebrandt, G. M. Hopper, Harold Hotelling, E. V. Hunting- 
ton, M. H. Ingraham, Dunham Jackson, W. L. Kichline, L. F. Knudsen, B. A. Lengyel, 
W. G. Madow, J. W. Mauchly, Richard von Mises, E. B. Mode, Jerzy Neyman, P. S. Olm- 
stead, Oystein Ore, M. M. Sandomire, L. W. Shaw, F. F. Stephan, A. G. Swanson, Abra- 
ham Wald, S. S. Wilks, Jacob Wolfowitz. 


The meeting of the Institute consisted of four sessions. At the first session, 
which was held on Tuesday morning, Professor Harold Hotelling of Columbia 
University delivered an address on The Teaching of Statistics. This address 
was followed by considerable discussion. on the various aspects of the teaching 
of statistics.: Preceding Professor Hotelling’s address a short paper on an 
Empirical Comparison of the “‘Smooth’” test for goodness of fit with Pearson’s 
Chi-Square test was presented by Professor J. Neyman of the University of 
California. 

Following Professor Hotelling’s address a business meeting of the Institute 
was held. At this time resolutions on the teaching of statistics were approved 
(see p. 472). The President reported that a War Preparedness Committee 
had been appointed in the summer to study the matter of the Institute’s par- 
ticipation in the national defense program.2 The Chairman of this Committee 
submitted a preliminary report which met the approval of the Institute. A 
plan was approved for completing the report and circularizing it with a minimum 
of delay. 

The matter of the organization of local sections or chapters of the Institute 
was discussed but no action was taken. 


1 Professor Hotelling’s address and three resolutions regarding the teaching of Statis- 
tics which were adopted by the Institute at a business meeting following the address are 
published in the present issue of the Annals of Mathematical Statistics, pp. 457-472. 

2 The membership of the Committee is as follows: 

Professor Churchill Eisenhart (Chairman), University of Wisconsin. 
Professor A. T. Craig, University of lowa. 

Professor E. G. Olds, Carnegie Institute of Technology. 

Captain Leslie E. Simon, Aberdeen Proving Ground. 

Mr. Ralph E. Wareham, General Electric Company. 


473 





474 REPORT OF HANOVER MEETING 


On Tuesday afternoon a session on contributed papers in Mathematical 
Statistics was held jointly with the American Mathematical Society. Pro- 
fessor B. H. Camp of Wesleyan University presided and the following papers 
were presented: 


1. Contributions to the theory of the representative method of sampling. 
Dr. W. G. Madow, Department of Agriculture, Washington. 
2. A generalization of the law of large numbers. 
Dr. Hilda P. Geiringer, Bryn Mawr College. 
. On the problem of two samples from normal populations with unequal variances. 
Professor 8. 8. Wilks, Princeton University. 
. Experimental determination of the maximum of an empirical function. 
Professor Harold Hotelling, Columbia University. 
. Asymptotically shortest confidence intervals. 
Dr. Abraham Wald, Columbia University. 
. Reduction of certain composite statistical hypotheses. 
Dr. G. W. Brown, R. H. Macy and Company, Inc., New York. 
. Conception of equivalence in the limit of tests and its application to certain d and x? 
tests. 
Professor J. Neyman, University of California. 
Abstracts of these papers follow this report. 


On Wednesday morning a session was held on The Theory of Probability 
with Dr. T. C. Fry of the Bell Telephone Laboratories, in the chair. The 
following addresses were given: 


1. On the foundations of probability theory. 
Professor R. von Mises, Harvard University. 
2. Probability as measure. 
Professor J. L. Doob, University of Illinois. 


This session was followed by an energetic discussion which was continued in an 
informal afternoon session. 

The Thursday morning session was devoted to the Theory of Statistical Esti- 
mation with Professor Harold Hotelling as Chairman. The following addresses 
were given: 


1. Estimation by intervals as a classical problem in probability. 
Professor J. Neyman, The University of California. 


2. Statistical estimation in large samples. Dr. Joseph F. Daly, The Catholic Univer- 
sity of America. 


On Monday at 4:15 p.m. a tea was held at the Graduate Club for members 
of the mathematical organizations and their guests, and on Monday at 8:00 a 
musical performance was presented. On Tuesday at 7:00 p.m. a joint dinner 
was held for the mathematical organizations in Thayer Hall. Wednesday 
afternoon was devoted to an excursion to Franconia Notch. 

During the meeting a collection of string models of ruled surfaces was ex- 
hibited by Professor Robin Robinson of Dartmouth College and electrical 
calculation apparatus made from telephone equipment was exhibited by mem- 
bers of the staff of the Bell Telephone Laboratories. 










eT- 


ers 
da 
ner 
lay 


ex- 
ical 
om- 












ABSTRACTS OF PAPERS 
(Presented on September 10, 1940, at the Hanover meeting of the Institute) 


Contributions to the Theory of the Representative Method of Sampling. 
Witi1am G. Mapow, Washington, D. C. 


The theory of representative sampling may be regarded as a dual sampling process; the 
first of which consists in the sampling of different random variables and the second of which 
consists in repeating several times the experiments associated with each of the different 
random variables. It follows that while the theory of sampling from finite populations 
without replacement may be required for the first process, the second leads directly into 
the theory of sampling from infinite populations. There is, however, one difference. 
Although the usual theory is concerned with the evaluation of fiducial or confidence limits 
for parameters the theory of sampling is concerned with the evaluation of fiducial or confi- 
dence limits for, say, the mean of a sample of N, when n, (N > n), of the values are known. 

It is thus possible to use the usual theories of estimation in obtaining estimates of the 
parameters and to allow the effects of subsampling process to show themselves in the 
different values of the fiducial limits. It is shown that the limits obtained are almost 
identical with those obtained by the theory of sampling from a finite population. Distri- 
butions of the statistics used in these limits are derived. 

Besides these results, the theory is extended to the theory of sampling vectors, and condi- 
tions are stated under which the “‘best’’ allocation of the number in a sample among several 
strata is proportional to the kth roots of the generalized variance of a random vector 
having k components. 


A Generalization of the Law of Large Numbers. Hi~pa Gerrincer, Bryn 
Mawr. 


Let Vi(z), V2(z), --- , Vn(x) be m probability distributions which are not supposed to 
be independent and let F(z: , z2, --- , tn) be a “‘statistical function’’ of n observations 
in the sense of v. Mises,—V;(z) (¢ = 1, 2, --- m) indicating as usual the probability of 
getting a result S z at the ith observation—. Then it can be proved that under fairly 
general conditions F(z,, 2%2,--:,2%n) converges stochastically toward its ‘‘theoretical 
value’; or in other words, that under these general conditions a great class of statistics 
F(z. , 22, +++ , In) 18 ‘‘consistent’’ in the sense of R. A. Fisher. 

Well known particular cases of this theorem result if (a) we take for F(z , 2, «++ , Zn) 
the average (xz; + 22 + --- + 2n)/n of the n observations, (b) we assume that the V;(z) 
are independent distributions. 


On the Problem of Two Samples from Normal Populations with Unequal Vari- 
ances. §. S. Wrks, Princeton University. 


Suppose O,, and O,, are samples of n; and nz elements from normal populations 7, and 
™: respectively. Let ai, of and az, o3 be the means and variances of 7 and x2 and let 
0,, and On, have means #,; and #2 and variances s? and s? (unbiased estimates of «7, 03) 
respectively. It is shown that there exists no function (Borel measurable) of #1, #:, 
8} , 83 , @1 — a2 independent of o; and o2, having its probability law independent of the 
four population parameters. It is therefore impossible to obtain exact confidence limits 


475 












476 ABSTRACTS OF PAPERS 





for a1 — a2 corresponding to a given confidence coefficient. Functions of the four parame- 
ters and four statistics are devised from which one can set up confidence limits for a; — a, 
with associated confidence coefficient inequalities. 


Experimental Determination of the Maximum of an Empirical Function. 
Haro.p Hore.uine, Columbia University. 


In physical and economic experimentation to determine the maximum of an unknown 
function, for example of a monopolist’s profit as a function of price, or of the magnetic 
permeability of an alloy as a function of its composition, the characteristic procedure is to 
perform experiments with chosen values of the argument z, each of which then yields an 
observation, subject to error, on the corresponding functional value y = f(z). The values 
of z need, however, to be chosen on the basis of earlier experiments in order to make the 
determination efficient. The experimentation properly proceeds, therefore, in successive 
stages, with the values used at each stage determined with the help of the earlier work. 
The question what distribution of z as a function of previous results should be used is 
discussed in this paper on the basis of various hypotheses regarding the function, and 
further criteria. In particular, a conflict is shown to exist under some conditions between 
the criterion of minimum sampling variance and that calling for absence of bias. 


Asymptotically Shortest Confidence Intervals. ABRAHAM WaLpD, Columbia 
University. 


Let f(z, 0) be the probability density function of a variate z involving an unknown 
parameter @. Denote by 721, ---, 2, independent rere: on z and let C,(@) bea 


ae = > log f(za, 0) | < Ca(6) 


is equal to a constant 8 under the assumption that @ is ™ _ value of the parameter. 


== log f(za, 0) = Cn(6) 


positive function of @ such that the probability that 








Denote by 6’(z1, --- , tn) the root in 6 of the equation aoe 20 


and by 6’’(z1, «++ , Zn) the root of —= — 5 log f(za, 9) = —C,(@). Under some weak 


a 
assumptions on f(z, @) the interval 6,(z1, --- , Zn) = [0’(21, +--+, tn), 0/" (a1, +++ , 2n)} 
is in the limit with n — © a shortest unbiased confidence interval! of @ corresponding to 
the confidence coefficient 8. This confidence interval is identical with that given by S. 8. 
Wilks in his paper ‘‘Shortest average confidence intervals from large samples,’’ The Annals 
of Mathematical Statistics, Sept. 1938. Wilks has shown that 6,(21, ---, 2) is asymptot- 
ically shortest in the average compared with all confidence intervals computed on the 
basis of statistics belonging to a certain class C. In the present paper it has been proved 
that the confidence interval in question is asymptotically shortest compared with any 
arbitrary unbiased confidence interval, without any restriction to a certain class of 
functions. 





Reduction of Certain Composite Statistical Hypotheses. Grorce W. Brown, 
R. H. Macy and Co., New York. 


The results obtained make it possible to reduce a large class of composite statistical 
hypotheses to equivalent simple hypotheses. The fundamental theorem established states 
essentially that if two distributions give rise, in sampling, to the same distribution of the 


1 For the definition of a shortest unbiased confidence interval see tHe paper by J. Ney- 
man, ‘‘Outline of a theory of statistical estimation based on the classical theory of proba- 
bility,’’ Phil. Trans. Roy. Soc. (1937). 








ia 


& 
9) 


er. 


ak 
a) 


als 
ot- 
the 
ved 
ny 

of 


ical 
ites 
the 


ley- 
yba- 


ABSTRACTS OF PAPERS 477 


set of differences between observations, then one distribution must be a translation of the 
other, subject to a condition requiring that the characteristic function of one of the distri- 
butions be such that any interior intervals of zeros be not too large. The result is estab- 
lished by means of the functional equation ¢(t:)p(ts)e(—ti — te) = W(ti)¥(ts)¥(—ti — ts) 
relating the characteristic functions. Similar results are obtained for scale, and com- 
bination of location and scale, and the corresponding situations in multivariate distribu- 
tions. This type of uniqueness theorem permits one to reduce a composite hypothesis 
involving an unknown location parameter (or scale, or both) to an equivalent simple 
hypothesis. 


Conception of Equivalence in the Limit of Tests and Its Application to Certain 
d- and x*-Tests. J. Neyman, University of California. 


Denote by E a system of observable variables and by N the number of independent 
observations of those variables to be used for testing a certain statistical hypothesis H 
against a set © of admissible simple hypotheses h. Let further 7,(N) and 7:(N) be two 
different tests of H using the same number N of observations. Consider the probability 
Py(h) calculated on any admissible simple hypothesis h, of the two tests, contradicting 
themselves. 

Definition: If, whatever be h e Q, the probability Pw(h) tends to zero as N is indefinitely 
increased, then the two tests are said to be equivalent in the limit. 

Consider a number s of series of independent trials and denote by Ei: , Ei2, --- , Lim; 
all the m; possible and mutually exclusive outcomes of each of the trials forming the ith 
series. Let pi; be the probability of £;; , n; the total number of trials in the ith series, 
and n,; the number of these which give the outcome E;; . 

Suppose that it is desired to test a composite hyrothesis H concerning all the proba- 
bilities p;; and consisting of the assumption that an one of them is a given linear function 
of some ¢t independent parameters 6; , so that 


(1) Pig ™ GBijo + GijiO1 + ove+ + aejeOe 


where the coefficients a;;, are known. The main result of the paper is then that the d-test 
of the above hypothesis H, tested against the set © of alternatives ascribing to the p;; 
any non-negative values, is equivalent in the limit to the test consisting of rejecting H 
when the minimum of the expression 


(2) on > 5 (nig — 15 Dis)* 


i—1 j=l mis 


calculated with respect to unrestricted variation of the 6’s, exceeds the tabled value of x? 

corresponding to the chosen level of significance ¢ and to the number of degrees of freedom 
a 

> m—s-— tl. 

i=1 

It will be noticed that the expression (2) differs from the usual x* in the denominator 
of each term. 

As an example of the application of the test based on (2), consider the case where M 
varieties of sugar beet are tested for resistance to a certain disease in an experiment 
arranged in N randomized blocks. Denote by n the number of beets selected at random 
for inspection from each plot and by n;; the number of those of the ith variety from the 
plot in the jth block which are found to be infected. Denote further by p;; the proportion 
of infected beets of the ith variety in the plot in the jth block. The hypothesis that the 
effects of variety and of block are additive is expressed by pij = p + Vi + By; with 
=V; = =B; = 0. To test this hypothesis we may use (2) which in this particular case 
reduces itself to 





478 ABSTRACTS OF PAPERS 


M oN 
(3) ¢~= Zz +» wii (gis — Pp — Vi — B;)2 


t=1 j=—1 


with wy; = n?/{ni3(n — nij)}, Qij = Nij/n. The minimum xj of x? is found by solving a 
set of equations which are linear in p, V; , B; and the comparison of x3 with the tabled 
value corresponding to (M — 1)(N — 1) degrees of freedom will tell us whether we are 
likely to be very wrong in assuming additivity or not. In the favorable case we may 
next proceed similarly to test another hypothesis that there is no differentiation between 
the varieties, so that Vi = V2 = --- = Vy = 0. 


Empirical Comparison of the “Smooth” Test for Goodness of Fit with the 
Pearson’s x? Test. J. Neyman, University of California. 


In a previous publication’? the author has deduced a test for goodness of fit, described 
as the ‘‘smooth test’’ or the ¥* test, applicable to cases where the hypothesis tested H 
is simple. The test is so devised as to be particularly sensitive to departures from H 
which are ‘‘smooth’”’ in the sense explained in detail in the publication quoted. Whether 
the test so devised does present any advantage over the usual x* test depends on how 
frequently we meet, in practice, cases where the hypotheses alternative to the one tested 
are actually smooth. 

The present investigation was undertaken with the object of obtaining some information 
on this point. For that purpose a number of cases described in the literature where there 
was a question of testing that some observable variable z follows some perfectly spécified 
distribution p(z) were analyzed. Of all such cases, the ones where there were a priori 
theoretical reasons to believe that p(x) could not possibly represent the true distribution 
of z and, at the most, it could be considered as only an approximation to the true distri- 
bution were selected. 

It was assumed that the departures from the hypothetical distributions are typical of 
those that may be met in practice when no definite information as to the actual state of 
affairs is available. The hypothesis of goodness of fit was tested both by means of the 
x? and by the fourth order smooth test. Out of the 130 cases studied the two tests were 
in perfect agreement eight times. Out of the remaining 122 cases the smooth test proved 
to be more sensitive than the x? in 70 cases and the x? better than the smooth test in 52 
cases. We may further compare the tests by counting those cases where one of them 
detected the falsehood of the hypothesis tested at a given level of significance while the 
other failed to do so. At the level of significance .05 the x? test rejected the hypothesis 
tested 13 times, while Py2 was >.05. The reverse was true in 17 cases. At the level of 
significance .01 the corresponding figures are 5 and 14, again in favor of the smooth test. 


2 J. Neyman, ‘‘ ‘Smooth Test’ for Goodness of Fit.’”’ Skandinavisk Aktuarietidskrift, 
1937, pp. 149-199. 





oe i ed 








REPORT OF THE WAR PREPAREDNESS COMMITTEE OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The generally recognized functions of a statistician are the calculation of 
averages, percentages, and index numbers; the construction of bar graphs and 
pie diagrams; and the compilation of data in general. His other activities 
are less widely known. In particular, the recent advances in mathematical sta- 
tistics are known to a relatively small proportion of the persons occupying 
responsible positions in academic life, in industry, and in government. The 
mathematical statistician, in fact, is concerned chiefly with the interpretation 
of data through the use of probability theory; his is the science of reasoning 
from a part to the whole, and of prediction; and to him falls the task of stating 
the conditions under which such inferences are possible, of devising means of 
testing whether these conditions are satisfied, and of evaluating the prob- 
ability that such ‘uncertain inferences’ are correct in specific instances. Fur- 
thermore, it is his responsibility to so plan the lay-out of experiments and the 
conduct of surveys that the data they yield will contain the maximum informa- 
tion on the points at issue and be amenable to unambiguous statistical 
interpretation. 

Because of the functions which the mathematical statistician can perform his 
services should be of value to the National Defensé Program in the following 


fields: 


I. Quality Control and Specification. The functions of a mathematical 
statistical nature connected with quality control and specification of articles 
produced by mass production are: 

(1) Tests of randomness. These are important because statistical methods 
of inference are strictly valid only for random samples. 

(2) The use of probability theory in predicting the outcome of future repetitions 
of an operation which is in a state of statistical control.1_ The evaluation of the 
probability that the quality of a piece of product will lie within any previously 
specified tolerance limits as long as a state of statistical control is maintained, 
and the development of sampling inspection techniques are examples of this 
function. 


1 A repetitive operation, such as a production process, is said to be in a state of statistical 
control when it produces a sequence of observations which exhibit the property random- 
ness. An important aspect of quality control is the improvement of quality which comes 
as the result of an effort to reduce a manufacturing process to a state of statistical control. 
Furthermore, when this state of control is attained it is possible to gain a reduction in 
cost of inspection, a reduction in cost of rejections, a reduction in tolerance limits where 
quality measurement is indirect, and the attainment of uniform quality even though the 
inspection test is destructive. 


479 





480 WAR PREPAREDNESS COMMITTEE 


(3) Representative sampling. When a repetitive operation such as a produc- 
tion process is not in a state of statistical control, it is not possible to make 
valid inferences about the quality of a lot from an examination of a sample 
from the lot unless the sampling process is one of random selection within 
“strata” in accordance with the principles of representative sampling. 

(4) Analysis of variance. Reference is made here to the technique whereby 
the total variability of a product of an operation which is in a state of statis- 
tical control can be decomposed into components associated with the various 
sub-operations involved. 

(5) Correlation methods. When a direct measurement of quality is extremely 
costly, it is sometimes advisable to use as an indirect measurement of quality 
the value of some character less costly to measure which is highly correlated 
with quality. 

(6) Specification of quality as a variable. Statistical theory, including tests 
for randomness, must be taken into account in writing quality specifications if 
the consumer is to be protected against the vagaries of sampling and the pro- 
ducer safeguarded from the incurring of penalties of an unjust chance. 


II. Sampling Surveys. The importance of conducting sampling surveys 
in accordance with the principles of representative sampling is well established. 
It is quite possible that such surveys and partial censuses will be needed in 
connection with the National Defense Program in order to determine the 
frequency and location of individuals possessing special traits, e.g. persons 
capable of withstanding the rigours of dive bombing, or persons possessing 
types of color blindness which render them valuable as observers who can 
detect camouflage, etc. The “problem of sizes” connected with Stores and 
Supplies—see below—may require careful preliminary surveys. Also, surveys 
may be needed to evaluate the effects of various types of propaganda. 


III. Experimentation of Various Kinds. The mathematical statistician 
can be of service in connection with experimentation of various kinds under- 
taken as a part of the National Defense Program since the following aspects 
of experimentation are of a mathematical statistical nature: 

(1) Randomization. Since statistical tests for the existence of differences 
between samples, of correlation, etc. are strictly valid only for random samples, 
the operation of randomization is of paramount importance in “the comparison 
of new designs, new materials or alloys, study of contact phenomena under 
different conditions, corrosion of materials under different atmospheric con- 
ditions, and field trial of equipment, to mention only a few.”’ If randomization 
is not undertaken, observed differences between designs, for instance, may have 
arisen from non-random assignable differences in the material presented. Fur- 
thermore, the validity of tests for significant differences between the effects 
of various designs rests upon the condition that the variability observed in 
the effects of each design be of random character and free from trends and 
non-random shifts in magnitude—i.e. the operation of determining the effects 

























NR me neues i: aS: aaa aaa A I LE 











EN ee me cas i a aan: aaa PE A 


WAR PREPAREDNESS COMMITTEE 481 


of each design must be in a state of statistical control, to use a phrase employed 
in quality control. 

(2) Experimental design. Without careful attention to the lay-out of an 
experiment, the data it yields may be difficult and even impossible to interpret. 
Therefore, the principles of experimental design set forth by R. A. Fisher and 
his followers are of great importance, as are also the special experimental ar- 
rangements which have been devised to cope with many of the more usual 
difficulties met in practice. 


IV. Personnel Selection. The allocation of individuals to places where 
they can be of greatest value in the National Defense Program will undoubt- 
edly require tests for mental and physical traits. Although the development 
and analysis of such tests is largely in the hands of psychometric groups, the 
use of methods of multivariate statistical analysis in such work renders this 
field one in which mathematical statistics ought to play an important role. 


It is in the above four fields that there is special need for the training and 
endowments of the mathematical statistician. He can also render valuable 
assistance in the following fields: 


V. Stores and Supplies. 

(1) Problem of sizes. Preliminary surveys are likely to prove useful in 
ascertaining the relative frequencies of demand for the respective sizes of cloth- 
ing, etc. in different parts of the country. 

(2) Development of procedures for charting the day to day location and move- 
ment of stores and supplies. 

(3) Problem of replacement of parts and equipment. In many it is more eco- 
nomical to make replacement at statistically determined times, than to wait 
for complete failure. 


VI. Transportation and Communication. Probability theory has shown 
its usefulness in peace time in handling “‘traffic’”’ problems that arise in telephone 
and telegraph communication, electric power distribution, etc. No doubt it 
will find corresponding application to problems in these fields arising out of the 
National Defense Program. 


VII. Gunnery and Bombing. Although there is a need in connection with 
artillery fire for further development of methods of estimating standard devia- 
tions from successive differences in order to minimize the biases arising from 
slowly changing conditions during the period of firing, the principles of artillery 
fire are quite firmly established and the relatively new science of bombing is 
likely to present greater opportunities for the application of the methods of 
mathematical statistics. For instance, in evaluating bombing techniques 
there is need of statistical methods in separating the constant biases from the 
random variability. 











482 WAR PREPAREDNESS COMMITTEE 


VIII. Meteorology. The extent to which statistical methods are being 
employed in meteorology can be seen from an examination of the Monthly 
Weather Review Supplement No. 39, issued April 1940, and entitled “‘Reports 
on Critical Studies of Methods of Long-Range Weather Forecasting.”’ There 
seems to be excellent opportunity here for the application of methods of multi- 
variate analysis and for the development and uses of methods applicable to 
serially correlated data. Such work would be of value in National Defense 
so far as it would enable the forecasting of conditions suitable for launching an 
attack. 


IX. Medicine. The National Defense Program will probably require the pre- 
paration and storage of hormone substances, toxic compounds, drugs, and other 
medicinal supplies. Since many such are examined for potency, toxicity, etc. 
by means of animal assays, there will be considerable opportunity here for 
the sound application of mathematical statistics in planning and interpreting 
these bioassays. 





In nearly all of the above activities the application of mathematical statistics 
is likely to encounter two major difficulties: 

(1) Obtaining an adequate trial of the methods of mathematical statistics. 

(2) Supplying persons to occupy key positions in the application of mathe- 
matical statistics in a given field—persons competent in mathematical statis- 
tics and who possess a sound background in the field of application. 

In some of the above activities, e.g. Quality Control, there will be the further 
difficulty of 

(3) Supplying the vast number of slightly trained workers who will gather 
the data and perform the analyses. 





It is with these difficulties in mind that the Committee recommends that the 
Institute 

(1) Prepare a register of Institute members, stating for each member his 
background, interests, and experience so far as these relate to mathematical 
statistics and its applications;? 

(2) Appoint a committee to handle inquiries concerning personnel qualified 
to deal with particular projects; 

(3) Cooperate to the fullest extent in matters pertaining to quality control 
and specification with the Joint Committee for the Development of Statistical 
Applications in Engineering and Manufacturing, of which the Institute is a 
sponsor # 





2 The preparation of this register should be coordinated with any similar undertaking 
sponsored by the National Roster of Scientific and Specialized Personnel, National Re- 
sources Planning Board, Executive Office of the President, Washington, D. C. 

2 We suggest the following as possible undertakings in a cooperative program with the 
Joint Committee: 

(1) Requesting statements regarding the potential contribution to National Defense 











rei Nm 


WAR PREPAREDNESS COMMITTEE 483 


(4) Undertake such steps as are feasible which will lead to cooperation with 
other organizations having interests similar to those of the Institute, e.g. the 
American Statistical Association, the Psychometric Society, and the Econo- 
metric Society. 

(5) Establish contact with the.National Defense Research Committee headed 
by Dr. Vannemar Bush and coordinate the Institute’s activities with those 
of this national Committee. 


In conclusion, we feel that as an organized group the Institute’s primary 
function in relation to the National Defense Program should be to serve as a 
reservoir of specialists, experienced in the use of the methods of mathematical 
statistics, who can direct the use of these methods and be of assistance in the 
development of new techniques as needed. As a secondary, but equally im- 
portant function, the Institute is in a position to supervise, and perhaps to 
undertake through the activities of its individual members, the training in 
mathematical statistics of the individuals who will be needed in the application 
of whatever statistical programs of the type noted above are undertaken in 
connection with the National Defense Program. It 1s recommended, therefore, 
that the Institute’s interest in the above activities, and its willingness to be called 
upon, be adequately publicized, possibly by sending copies of this report to various 
members of the Government, such as the Chief Signal Officer and the Coordina- 


of statistical methods in quality control and specification from men prominent in industry 
who are familiar with recent developments in quality control. Such individuals would 
be asked to give, where possible, concrete evidence of the value of such methods in their 
experience—evidence which would be helpful in securing authoritative acceptance of 
statistical methods in quality control and specification. 

(2) The organization of a syllabus on statistical methods for use in evening courses 
at various industrial centers. (Captain Simon of our Committee is preparing ‘‘An En- 
gineer’s Manual of Statistical Methods” which will be issued shortly.) 

(3) The preparation of a list of topics for inclusion in university courses. 

(4) The preparation of a list of suggested reading on statistical methods in quality 
control and specification, arranged under such headings as “‘expository,’’ ‘‘methodology,”’ 
etc. 

(5) The arrangement of local meetings and round table discussions at some of the uni- 
versities in a few large industrial centers. Some well known leader of the locality might 
serve as chairman. To such a meeting would be invited those men in local industries who 
were interested in the possibility of applying statistical methods to their problems, and 
the meeting could be thrown open to discussion after a brief paper outlining the accom- 
plishments of statistical methods of quality control in the speaker’s experience and stating 
the advantages to be gained by employing such methods in the mass production of the 
War Preparedness Program. 

(6) Sponsor the preparation of popular expository articles on quality control for in- 
dustrial journals, Readers Digest, Scientific American, etc., and other activities designed 
to popularize the subject and gain authoritative acceptance of statistical methods of 
quality control. 





484 WAR PREPAREDNESS COMMITTEE 


tor of National Defense Purchases and also to the secretaries of appropriate 
organizations, such as the American Standards Association, with the request 
that they advise the Institute of any specific action they feel the Institute 
should take. 


A. T. CraiGa 

E. G. OLDs 

L. E. Srwon 

R. E. WAREHAM 

C. E1rsENHART, Chairman. 





MATHEMATICAL REVIEWS 


an international journal to abstract and review current mathemat- 


ical literature, including the Theory of Probability and Theoretical 
Statistics and applications 


Volume 1 is now being published 


MicroFILM SERVICE 
A microfilm copy of any paper reviewed can be obtained at a nominal cost. 
Subscription Rates: 


$6.50 per year to members $13.00 per year 
of sponsoring organisations to others 


Sponsored by the American Mathematical Society, the Mathematical 
Association of America, and other organisations 


Send subscriptions to 
MATHEMATICAL REVIEWS 
Brown University, Providence, Rhode Island, U. 8. A. 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


SEPTEMBER, 1940 - $1.50 pen Copy - $6.00 peR ANNUM ~- VoL. 35 - No. 211 


Hourly Earnings and Unit Labor Cost in Manufacturing Irvine H. Srece. 
A Measure of Purchasing Power Inflation and Deflation Morray SHIELDs 
Factors to be Considered in Measuring Intercity and mien Differences in 
Living Costs ‘AITH M. WILLIAMS 
The Wisconsin Committee on Statistics E. W. MorEHOUSE 
Classification of Hospital Morbidity Marta FRAENKEL 
Factorial Design and Covariance in the Biological Assay of Vitamin D..C. I. Briss 
Interdependence in a Series Lita F. KnNupsEN 
Note on Tests of Departure from Normality . Mapow 
Statistical News and Notes 
Chapter Activities 
Book Reviews 


Address inquiries and orders ivr subscripti numbers to R. L. Funkhouser, 
Secretary, American Statistical on, 1626 eK st = N. W., Washington, D. C. 





BIOMETRIKA 


A Journal for the Statistical Study of Biological Problems 
Volume XXXI, Parts III and IV 


CONTENTS 


I. On generalized analysis of variance. By P. L. Hsu. II. The derivation of the fifth and sixth moments 
of the distribution of bz: in samples from a norma! population. By C. T. Hsu anp D. N. Lawuey. III. 
Testing the homogeneity of a set of variances. By H.O. Hartiny. IV. The simultaneous distribution in 
samples of mean and standard deviation, and of mean and variance. By L. Truxsa. V. Certain projective 
depth and breadth measurements of the facial skeleton in man. By ALetre Scureiner. VI. Transposition 
of the viscera and other reversals of symmetry in monozygotic twins. By E. A.Cockarng. VII. The hu- 
man remains of the iron age and other periods from Maiden Castle, Dorset. By C. N. GoopMAN ANnp G. M. 
Morant. VIII. Homogeneity of results in testing samples from Poisson series. With an application to 
testing clover seed for dodder. By J. Przrsorowskx! anp H. Wiuenskr. IX. On the method of paired 
comparisons. By M. G. Kenpauu anv B. Basinaton Smits. X. The mean and variance of x?, when used 
as a test of homogeneity, when expectations aresmall. By J.B.S.Hautpang. XI. A note on the statistical 
analysis of sentence-length as a criterion of literary style. By C. B. Wiu1ams. XII. Applications of the 
non-central ¢-distribution. By N. L. Jonnson anp B. L. Waice. Miscellanea: (i) Note on Stirling's 
Approximation. By E. V. Huntinerton. (ii) A note on the interpretation of quasi-sufficiency. By M. 8, 
Bartuztr. (iii) The cumulants and moments of the binomial distribution, and the cumulants of x? for a 
(n X 2)-fold table. By J. B. 8S. Hatpane. 


The subscription price, payable in advance, is 45s. inland, 54s. export (per volume including postage). Cheques 

should be drawn to Biometrika and sent to ‘The Secretary, Biometrika Office, Department of Statistics, 

p agrnened College, London, W.C.1.”” All foreign cheques must be in sterling and on a bank having a Lon- 
m agency. 


SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vou. 4, Part 4, 1940 


Articles on Theoretical Statistics 


A Sample ow of the Acreage under Jute in Bengal........P. C. MAHALANOBIS 
Discussion on Planning of Experiments P. C. MAHALANOBIS 
Technique of Random Sampling K. B. Mapwava 
The Use and Distribution of the Studentized D*-Statistic when the Variances and 

Covariances are based on K Sampl S. N. Roy anv R. C. Boss 
The Median in Tests by Randomization K. R. Narr 


Table of Confidence Interval for the Median in Samples from any Continuous 
Population K. R. Narr 


A Study of Forty-three Years of Rainfall in Calcutta, 1893-1935 : 
A Note on Examination Marks . 


Articles in the following fields are also included: 


Economic anp Business Statistics 


Annual subscription rate: 20 rupees (or 32 shillings) 
Inquiries and orders may be addressed to the Editor, 
Sankhyd, Presidency College, Calcutta, India. 





an 





THE INSTITUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 
OFFICERS FOR 1940 
President: 
8. 8S. Wiixs, Princeton University, Princeton, N. J. 


Vice-Presidents: 


C. C. Crate, University of Michigan, Ann Arbor, Michigan. 
A. T. CraiG, University of Iowa, Iowa City, Iowa. 


Secretary-Treasurer: 
P. R. River, Washington University, St. Louis, Mo. 


The purpose of the Institute of Mathematical Statistics is to stimulate 
research in the mathematical theory of statistics and to promote codperation 
between the field of pure research and the fields of application. 


Membership dues including subscription to the ANNALS OF MATHEMATICAL 
Statistics are $5.00 per year. The dues and inquiries regarding member- 


ship in the Institute should be sent to the Secretary-Treasurer of the 
Institute. 





